About

Project: FL100

About: Preproccess the data by determining the best prevalence thresholds to use. Run “standard” microbiome analysis focusing on plasma TMAO. This includes alpha and beta diversity, ANCOM differential abundance, and DESeq2 differential abundance analyses.

Inputs: Inputs include:

  1. TMAO phyloseq object. Recall, the data from Dr. Kable was filtered to taxa with 2% abundance, which removes technical noise. Therefore, the remaining data is believable. Further filtering we do will be driven by our interest in more prevalent taxa. Furthermore, we should not recalculate the alpha diversity statistics after our filtering because that would incorrectly describe the microbiome environment.
  2. Master phenotype data

Outputs: Outputs include:

  1. Plots per section [expand as analysis goes final]
  2. Tables [expand as analysis goes final]

Sources:

  1. Example using Negative Binomial in Microbiome Differential Abundance Testing
  2. Workflow for Microbiome Data Analysis: from raw reads to community analyses
  3. Example of what descriptive microbiome stats to report in the results

Updates: On 01-19-2022 I reran many of the analyses with the upadted phyloseq object PSOtmao_NA to see if my results were consistent. I saved this markdown to the update-taxonomy-table branch of the GitHub FL100_TMAO project. Background - When MK was reading my manuscript, she noticed that I have some entries in the taxonomy table that say NA and some that say "g__" or "f“. While these may have different meanings, she suggests simply changing all cases to NA. (Another solution, you could manually differentaite”g" to "g__ASV1" so that the glomming isn’t wrong).

Set Up

library(DESeq2)
library(phyloseq)
library(ggplot2)
library(dplyr)
library(vegan)
#load("../../data/processed/microbiome/phyloseq_objects/PSOtmao_211108.RData") #PSOtmao
load(file = "../../data/processed/microbiome/phyloseq_objects/PSOtmao_220119.RData") #PSOtmao_NA
PSOtmao <- PSOtmao_NA
# Add age_cat
sample_data(PSOtmao)$age_cat <- ifelse(sample_data(PSOtmao)$age < 34, 1,
                         ifelse(sample_data(PSOtmao)$age >= 34 & sample_data(PSOtmao)$age < 50, 2, 3))
sample_data(PSOtmao)$age_cat <- as.factor(sample_data(PSOtmao)$age_cat)
# Add bmi_cat
sample_data(PSOtmao)$bmi_cat <- ifelse(sample_data(PSOtmao)$bmi_final.x < 25, 1,
                         ifelse(sample_data(PSOtmao)$bmi_final.x >= 25 & sample_data(PSOtmao)$bmi_final.x < 30, 2, 3))
sample_data(PSOtmao)$bmi_cat <- as.factor(sample_data(PSOtmao)$bmi_cat)

# Factor sex, BMI okay
sample_data(PSOtmao)$sex = factor(sample_data(PSOtmao)$sex)

# Make sure "below" is the reference factor
str(sample_data(PSOtmao)$tmao_mdn)
##  Factor w/ 2 levels "above","below": 2 2 1 2 1 2 2 2 2 1 ...
sample_data(PSOtmao)$tmao_mdn <- relevel(sample_data(PSOtmao)$tmao_mdn, ref = "below")

# Add TMAO quantiles
tmao_quantile <- quantile(sample_data(PSOtmao)$tmao)
sample_data(PSOtmao)$tmao_quantile <- ifelse(sample_data(PSOtmao)$tmao <= tmao_quantile[2], "Quantile1",
                             ifelse(sample_data(PSOtmao)$tmao > tmao_quantile[2] & sample_data(PSOtmao)$tmao <= tmao_quantile[3], "Quantile2",
                                   ifelse(sample_data(PSOtmao)$tmao > tmao_quantile[3] & sample_data(PSOtmao)$tmao <= tmao_quantile[4], "Quantile3", "Quantile4")))

# Factor it
sample_data(PSOtmao)$tmao_quantile = factor(sample_data(PSOtmao)$tmao_quantile)

# Add TMAO tertiles
tmao_tertile <- quantile(sample_data(PSOtmao)$tmao, c(0:3/3))
sample_data(PSOtmao)$tmao_tertile <- ifelse(sample_data(PSOtmao)$tmao <= tmao_tertile[2], "Tertile1",
                             ifelse(sample_data(PSOtmao)$tmao > tmao_tertile[2] & sample_data(PSOtmao)$tmao <= tmao_tertile[3], "Tertile2", "Tertile3"))
# Factor it
sample_data(PSOtmao)$tmao_tertile = factor(sample_data(PSOtmao)$tmao_tertile)
# Lookss good
table(sample_data(PSOtmao)$tmao_tertile)
## 
## Tertile1 Tertile2 Tertile3 
##      119      118      118

Save taxonomy table for manuscript.

tax_manuscript <- tax_table(PSOtmao)
write.csv(tax_manuscript, "../../Outputs/tables/Microbiome/PSOtmao_taxonomyTable.csv")

Preproccess

Table of read counts at the Phylum level. We will filter low abundant Phyla. We will also require that taxa are present in at least 10% of samples (at least 36 samples). We will then agglomerate at the Genus level.

print("The number of different bacteria at each phyla:")
## [1] "The number of different bacteria at each phyla:"
table(tax_table(PSOtmao)[, "Phylum"])
## 
##   p__Actinobacteria    p__Bacteroidetes    p__Cyanobacteria    p__Elusimicrobia 
##                  45                 575                   8                   1 
##       p__Firmicutes     p__Fusobacteria   p__Proteobacteria     p__Spirochaetes 
##                2042                   4                  44                   1 
##    p__Synergistetes      p__Tenericutes  p__Verrucomicrobia 
##                   1                  60                   5
# Prevalence - how many samples the taxa is observed in
prevdf = apply(X = otu_table(PSOtmao),
               MARGIN = ifelse(taxa_are_rows(PSOtmao), yes = 1, no = 2),
               FUN = function(x){sum(x > 0)})

prevdf = data.frame(Prevalence = prevdf,
                    TotalAbundance = taxa_sums(PSOtmao), # the total number of occurrences (even if only observed 50 times in 1 sample, the number would be 50)
                    tax_table(PSOtmao))

print("Table showing prevalence, total abundance, and taxonomy:")
## [1] "Table showing prevalence, total abundance, and taxonomy:"
prevdf[1:20,]
##                                  Prevalence TotalAbundance     Kingdom
## 77a72e43b8d0143ee2d29c79be1b00da         25           7590 k__Bacteria
## 909ae41a723a8fd891fe419d479eedee          1             67 k__Bacteria
## ede1221982486d3a7b845b3d9b52d143          3            612 k__Bacteria
## 20f39b86c19b9d76d340da961091b85c          2            127 k__Bacteria
## a735c8388a8121dd152ff23c4542b704          1            153 k__Bacteria
## 22138725bb36874fd5109e9a0070f2f8          2             93 k__Bacteria
## 8bf97afdd2f9c12017be818206946c22          1             48 k__Bacteria
## 88efe73ef81626a8a150533dea375e82          1            151 k__Bacteria
## 54fe5a27808d4d09c806b4c562a27601          2             59 k__Bacteria
## 2b0390d3c06b1bbe94ec48f3b036a500          1             52 k__Bacteria
## aed8608657afe922ec14430de7b48f3e          1            198 k__Bacteria
## 89d1dade4f693199e3aed179f37508d1          1             87 k__Bacteria
## 6e6f0f914f6175cd071197f97b9672c4         43            558 k__Bacteria
## bf1343d2ee6d94673c0907bc566ea9d9         76            858 k__Bacteria
## f1d7ac8c18c1d8144a00e4d785c86e4e         47            529 k__Bacteria
## 484ead003b64fe20c520597fa7797d23         34           1128 k__Bacteria
## 344bc48fbf2d6572954c51cdf104061b        118           2177 k__Bacteria
## 3520cbb42eb18c7d0889d4acd1fb17c1          1             41 k__Bacteria
## 5adf9526730ed772ee3ef6cd39caeee6          2             81 k__Bacteria
## f28bcb82096cccf4d2334f12b76c3d8e        126           4355 k__Bacteria
##                                             Phylum           Class
## 77a72e43b8d0143ee2d29c79be1b00da  p__Bacteroidetes  c__Bacteroidia
## 909ae41a723a8fd891fe419d479eedee  p__Bacteroidetes  c__Bacteroidia
## ede1221982486d3a7b845b3d9b52d143  p__Bacteroidetes  c__Bacteroidia
## 20f39b86c19b9d76d340da961091b85c  p__Bacteroidetes  c__Bacteroidia
## a735c8388a8121dd152ff23c4542b704  p__Bacteroidetes  c__Bacteroidia
## 22138725bb36874fd5109e9a0070f2f8  p__Bacteroidetes  c__Bacteroidia
## 8bf97afdd2f9c12017be818206946c22  p__Bacteroidetes  c__Bacteroidia
## 88efe73ef81626a8a150533dea375e82  p__Bacteroidetes  c__Bacteroidia
## 54fe5a27808d4d09c806b4c562a27601  p__Bacteroidetes  c__Bacteroidia
## 2b0390d3c06b1bbe94ec48f3b036a500  p__Bacteroidetes  c__Bacteroidia
## aed8608657afe922ec14430de7b48f3e  p__Bacteroidetes  c__Bacteroidia
## 89d1dade4f693199e3aed179f37508d1  p__Bacteroidetes  c__Bacteroidia
## 6e6f0f914f6175cd071197f97b9672c4  p__Bacteroidetes  c__Bacteroidia
## bf1343d2ee6d94673c0907bc566ea9d9  p__Bacteroidetes  c__Bacteroidia
## f1d7ac8c18c1d8144a00e4d785c86e4e  p__Bacteroidetes  c__Bacteroidia
## 484ead003b64fe20c520597fa7797d23  p__Bacteroidetes  c__Bacteroidia
## 344bc48fbf2d6572954c51cdf104061b  p__Bacteroidetes  c__Bacteroidia
## 3520cbb42eb18c7d0889d4acd1fb17c1  p__Bacteroidetes  c__Bacteroidia
## 5adf9526730ed772ee3ef6cd39caeee6  p__Bacteroidetes  c__Bacteroidia
## f28bcb82096cccf4d2334f12b76c3d8e  p__Bacteroidetes  c__Bacteroidia
##                                              Order             Family
## 77a72e43b8d0143ee2d29c79be1b00da  o__Bacteroidales  f__Prevotellaceae
## 909ae41a723a8fd891fe419d479eedee  o__Bacteroidales  f__Prevotellaceae
## ede1221982486d3a7b845b3d9b52d143  o__Bacteroidales  f__Prevotellaceae
## 20f39b86c19b9d76d340da961091b85c  o__Bacteroidales               <NA>
## a735c8388a8121dd152ff23c4542b704  o__Bacteroidales               <NA>
## 22138725bb36874fd5109e9a0070f2f8  o__Bacteroidales               <NA>
## 8bf97afdd2f9c12017be818206946c22  o__Bacteroidales               <NA>
## 88efe73ef81626a8a150533dea375e82  o__Bacteroidales               <NA>
## 54fe5a27808d4d09c806b4c562a27601  o__Bacteroidales               <NA>
## 2b0390d3c06b1bbe94ec48f3b036a500  o__Bacteroidales               <NA>
## aed8608657afe922ec14430de7b48f3e  o__Bacteroidales               <NA>
## 89d1dade4f693199e3aed179f37508d1  o__Bacteroidales   f__Rikenellaceae
## 6e6f0f914f6175cd071197f97b9672c4  o__Bacteroidales   f__Rikenellaceae
## bf1343d2ee6d94673c0907bc566ea9d9  o__Bacteroidales   f__Rikenellaceae
## f1d7ac8c18c1d8144a00e4d785c86e4e  o__Bacteroidales   f__Rikenellaceae
## 484ead003b64fe20c520597fa7797d23  o__Bacteroidales   f__Rikenellaceae
## 344bc48fbf2d6572954c51cdf104061b  o__Bacteroidales   f__Rikenellaceae
## 3520cbb42eb18c7d0889d4acd1fb17c1  o__Bacteroidales   f__Rikenellaceae
## 5adf9526730ed772ee3ef6cd39caeee6  o__Bacteroidales   f__Rikenellaceae
## f28bcb82096cccf4d2334f12b76c3d8e  o__Bacteroidales   f__Rikenellaceae
##                                          Genus          Species
## 77a72e43b8d0143ee2d29c79be1b00da          <NA>             <NA>
## 909ae41a723a8fd891fe419d479eedee          <NA>             <NA>
## ede1221982486d3a7b845b3d9b52d143          <NA>             <NA>
## 20f39b86c19b9d76d340da961091b85c          <NA>             <NA>
## a735c8388a8121dd152ff23c4542b704          <NA>             <NA>
## 22138725bb36874fd5109e9a0070f2f8          <NA>             <NA>
## 8bf97afdd2f9c12017be818206946c22          <NA>             <NA>
## 88efe73ef81626a8a150533dea375e82          <NA>             <NA>
## 54fe5a27808d4d09c806b4c562a27601          <NA>             <NA>
## 2b0390d3c06b1bbe94ec48f3b036a500          <NA>             <NA>
## aed8608657afe922ec14430de7b48f3e          <NA>             <NA>
## 89d1dade4f693199e3aed179f37508d1  g__Rikenella             <NA>
## 6e6f0f914f6175cd071197f97b9672c4          <NA>             <NA>
## bf1343d2ee6d94673c0907bc566ea9d9          <NA>             <NA>
## f1d7ac8c18c1d8144a00e4d785c86e4e  g__Alistipes  s__indistinctus
## 484ead003b64fe20c520597fa7797d23          <NA>             <NA>
## 344bc48fbf2d6572954c51cdf104061b          <NA>             <NA>
## 3520cbb42eb18c7d0889d4acd1fb17c1          <NA>             <NA>
## 5adf9526730ed772ee3ef6cd39caeee6  g__Alistipes    s__finegoldii
## f28bcb82096cccf4d2334f12b76c3d8e          <NA>             <NA>
# Make table with mean prevalence, total prevalence
print("Table showing mean prevalencce and total prevalence per Phylum:")
## [1] "Table showing mean prevalencce and total prevalence per Phylum:"
plyr::ddply(prevdf, "Phylum", function(df_mb){cbind("Mean" = mean(df_mb$Prevalence),"Sum" = sum(df_mb$Prevalence))})
##                 Phylum      Mean   Sum
## 1    p__Actinobacteria 32.733333  1473
## 2     p__Bacteroidetes 12.624348  7259
## 3     p__Cyanobacteria  5.125000    41
## 4     p__Elusimicrobia  2.000000     2
## 5        p__Firmicutes 20.987757 42857
## 6      p__Fusobacteria  3.250000    13
## 7    p__Proteobacteria 21.159091   931
## 8      p__Spirochaetes  1.000000     1
## 9     p__Synergistetes 23.000000    23
## 10      p__Tenericutes  5.966667   358
## 11  p__Verrucomicrobia 35.000000   175

From the phylum table, we can see that p__Elusimicrobia, p__Spirochaetes, and p__Synergistetes have the same mean and sum, meaning they are only present in 1 sample. We are interested and confident in the more abundant taxa so manually filter them out. Per Dr. Kable’s expert advice, she already filtered to 2% prevalence to remove technical error. We will increase the filtering to 5%, a common threshold in the literature.

# Define prevalence threshold as 5% of total samples
prevalenceThreshold <- 0.05 * nsamples(PSOtmao)
cat("The 10% prevalence threshold is", prevalenceThreshold, "samples.", "\n")
## The 10% prevalence threshold is 17.75 samples.
# Execute prevalence filter, using `prune_taxa()` function
keepTaxa = rownames(prevdf)[(prevdf$Prevalence >= prevalenceThreshold)]
PSOtmao1 = prune_taxa(keepTaxa, PSOtmao)

print("The number of different bacteria at each phyla:")
## [1] "The number of different bacteria at each phyla:"
table(tax_table(PSOtmao1)[, "Phylum"])
## 
##   p__Actinobacteria    p__Bacteroidetes       p__Firmicutes   p__Proteobacteria 
##                  15                  67                 407                  11 
##    p__Synergistetes      p__Tenericutes  p__Verrucomicrobia 
##                   1                   4                   3
# Look at phylum level distributions
ggplot(prevdf, aes(TotalAbundance, Prevalence / nsamples(PSOtmao1),color=Phylum)) +
  geom_hline(yintercept = 0.05, alpha = 0.5, linetype = 2) +  # Include a guess for parameter
  geom_point(size = 2, alpha = 0.7) +
  scale_x_log10() +  
  xlab("Total Abundance") + 
  ylab("Prevalence [Frac. Samples]") +
  facet_wrap(~Phylum) + 
  theme(legend.position="none")

# Agglomerate taxa
PSOtmao1G = tax_glom(PSOtmao1, "Genus", NArm = TRUE)

Great. Do your due diligence and inspect each phyloseq object to see how the filtering and glomming steps changed the number of taxonomy.

# Inspect the effect of each filtering step on the phyloseq object
PSOtmao
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 2786 taxa and 355 samples ]
## sample_data() Sample Data:       [ 355 samples by 64 sample variables ]
## tax_table()   Taxonomy Table:    [ 2786 taxa by 7 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 2786 tips and 2730 internal nodes ]
PSOtmao1
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 508 taxa and 355 samples ]
## sample_data() Sample Data:       [ 355 samples by 64 sample variables ]
## tax_table()   Taxonomy Table:    [ 508 taxa by 7 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 508 tips and 502 internal nodes ]
PSOtmao1G
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 52 taxa and 355 samples ]
## sample_data() Sample Data:       [ 355 samples by 64 sample variables ]
## tax_table()   Taxonomy Table:    [ 52 taxa by 7 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 52 tips and 51 internal nodes ]

Look at Genus level distributions of the bugs.

# Prevalence - how many samples the taxa is observed in
prevdf = apply(X = otu_table(PSOtmao1G),
               MARGIN = ifelse(taxa_are_rows(PSOtmao1G), yes = 1, no = 2),
               FUN = function(x){sum(x > 0)})

prevdf = data.frame(Prevalence = prevdf,
                    TotalAbundance = taxa_sums(PSOtmao1G), # the total number of occurrences (even if only observed 50 times in 1 sample, the number would be 50)
                    tax_table(PSOtmao1G))

print("Table showing prevalence, total abundance, and taxonomy:")
## [1] "Table showing prevalence, total abundance, and taxonomy:"
prevdf[1:10,]
##                                  Prevalence TotalAbundance     Kingdom
## f1d7ac8c18c1d8144a00e4d785c86e4e         47            529 k__Bacteria
## 05404c9fdf9f3f334eb618bac3f434f6        228           4099 k__Bacteria
## 06105df60508c2ed24a54f1b8ed64e49        326          47062 k__Bacteria
## bb1b75f41ff9c9db1d1de41e8388eb52        355         640634 k__Bacteria
## 1efe70e365249c0d2fc28580b6ba0529         56           4097 k__Bacteria
## 098c3bbd8234f4ac198297ac0bde957d        178         499064 k__Bacteria
## db8f48a3fe2fca95fb4986f5507b9076        297          52483 k__Bacteria
## 3f5533292a655d0f54a64560d65e823b        271          46075 k__Bacteria
## 7017e1746ae742d69c50c1274cc2f02e         94           1314 k__Bacteria
## 803eb52cfe3d77bdf9fe14c011e425fb         98           1739 k__Bacteria
##                                              Phylum              Class
## f1d7ac8c18c1d8144a00e4d785c86e4e   p__Bacteroidetes     c__Bacteroidia
## 05404c9fdf9f3f334eb618bac3f434f6   p__Bacteroidetes     c__Bacteroidia
## 06105df60508c2ed24a54f1b8ed64e49   p__Bacteroidetes     c__Bacteroidia
## bb1b75f41ff9c9db1d1de41e8388eb52   p__Bacteroidetes     c__Bacteroidia
## 1efe70e365249c0d2fc28580b6ba0529   p__Bacteroidetes     c__Bacteroidia
## 098c3bbd8234f4ac198297ac0bde957d   p__Bacteroidetes     c__Bacteroidia
## db8f48a3fe2fca95fb4986f5507b9076  p__Actinobacteria  c__Actinobacteria
## 3f5533292a655d0f54a64560d65e823b  p__Actinobacteria  c__Coriobacteriia
## 7017e1746ae742d69c50c1274cc2f02e  p__Actinobacteria  c__Coriobacteriia
## 803eb52cfe3d77bdf9fe14c011e425fb  p__Actinobacteria  c__Coriobacteriia
##                                                  Order                   Family
## f1d7ac8c18c1d8144a00e4d785c86e4e      o__Bacteroidales         f__Rikenellaceae
## 05404c9fdf9f3f334eb618bac3f434f6      o__Bacteroidales    f__[Odoribacteraceae]
## 06105df60508c2ed24a54f1b8ed64e49      o__Bacteroidales    f__Porphyromonadaceae
## bb1b75f41ff9c9db1d1de41e8388eb52      o__Bacteroidales        f__Bacteroidaceae
## 1efe70e365249c0d2fc28580b6ba0529      o__Bacteroidales  f__[Paraprevotellaceae]
## 098c3bbd8234f4ac198297ac0bde957d      o__Bacteroidales        f__Prevotellaceae
## db8f48a3fe2fca95fb4986f5507b9076  o__Bifidobacteriales    f__Bifidobacteriaceae
## 3f5533292a655d0f54a64560d65e823b   o__Coriobacteriales     f__Coriobacteriaceae
## 7017e1746ae742d69c50c1274cc2f02e   o__Coriobacteriales     f__Coriobacteriaceae
## 803eb52cfe3d77bdf9fe14c011e425fb   o__Coriobacteriales     f__Coriobacteriaceae
##                                                Genus Species
## f1d7ac8c18c1d8144a00e4d785c86e4e        g__Alistipes    <NA>
## 05404c9fdf9f3f334eb618bac3f434f6      g__Odoribacter    <NA>
## 06105df60508c2ed24a54f1b8ed64e49  g__Parabacteroides    <NA>
## bb1b75f41ff9c9db1d1de41e8388eb52      g__Bacteroides    <NA>
## 1efe70e365249c0d2fc28580b6ba0529   g__Paraprevotella    <NA>
## 098c3bbd8234f4ac198297ac0bde957d       g__Prevotella    <NA>
## db8f48a3fe2fca95fb4986f5507b9076  g__Bifidobacterium    <NA>
## 3f5533292a655d0f54a64560d65e823b      g__Collinsella    <NA>
## 7017e1746ae742d69c50c1274cc2f02e    g__Adlercreutzia    <NA>
## 803eb52cfe3d77bdf9fe14c011e425fb      g__Eggerthella    <NA>
# Family
ggplot(prevdf, aes(TotalAbundance, Prevalence / nsamples(PSOtmao1G),color=Family)) +
  geom_hline(yintercept = 0.05, alpha = 0.5, linetype = 2) + # Include a guess for parameter
  geom_point(size = 1, alpha = 0.7) +
  scale_x_log10() +
  xlab("Total Abundance") +
  ylab("Prevalence [Frac. Samples]") +
  facet_wrap(~Family) +
  theme(legend.position="none")

# Genus - have to blow it up to make it useful
ggplot(prevdf, aes(TotalAbundance, Prevalence / nsamples(PSOtmao1G),color=Genus)) +
  geom_hline(yintercept = 0.05, alpha = 0.5, linetype = 2) + # Include a guess for parameter
  geom_point(size = 1, alpha = 0.7) +
  scale_x_log10() +
  xlab("Total Abundance") +
  ylab("Prevalence [Frac. Samples]") +
  facet_wrap(~Genus) +
  theme(legend.position="none")

Look for Outliers Using PCoA

We want to identify outlier taxa that might drive the statistical analyses. One way to do this is to look for outliers in the PCoA plot.

PCoA_BC <- ordinate(PSOtmao1G, "PCoA", "bray")

# BC
# taxa is similar to species
taxa_plot <- plot_ordination(PSOtmao1G, PCoA_BC, type = "taxa")
taxa_plot

# It looks like there is an outlier in the top left corner of the plot. 
# Identify which taxa that bacteria is from and remove it. 
tp_data <- taxa_plot$data
plot(tp_data$Axis.2) # yes an outlier!

# Who is it?
outlier <- tp_data[tp_data$Axis.2 == max(tp_data$Axis.2), ] 
cat("Which genus is the outlier?", outlier$Genus, "\n")
## Which genus is the outlier?  g__Prevotella

Super interesting! The “outlier” is prevotella, which is interesting in the microbiome community as a marker of individuals who consume/live/have non-industrialized diets/environments/guts. We won’t remove this bug afterall.

Save

save(PSOtmao1, file = "../../data/processed/microbiome/phyloseq_objects/PSOtmao1_211118.RData")
save(PSOtmao1G, file = "../../data/processed/microbiome/phyloseq_objects/PSOtmao1G_211118.RData")

Phylogenetic Abundance plots

Family Plot of TMAO Quantile

# Agglomerate taxa
PSOtmao1F = tax_glom(PSOtmao1, "Family", NArm = TRUE)
PSOtmao1F.top10 <- microbiome::aggregate_top_taxa(PSOtmao1F, "Family", top = 10)
## Warning: 'microbiome::aggregate_top_taxa' is deprecated.
## Use 'aggregate_rare' instead.
## See help("Deprecated") and help("The microbiome::aggregate_top_taxa function is deprecated.-deprecated").
ps1F.10.family.comp <- microbiome::transform(PSOtmao1F.top10, transform="compositional")

plot.composition.relAbun <- microbiome::plot_composition(ps1F.10.family.comp,
                                             sample.sort = "Description",
                                             x.label = "tmao_quantile",
                                             group_by = "tmao_quantile",
                                             average_by = "tmao_quantile") +
  theme_minimal() +
  scale_fill_brewer("Family", palette = "Paired") + 
  ggtitle("Relative Abundance at Family Level")

plot.composition.relAbun

Family Plot of TMAO Tertile

# Agglomerate taxa
PSOtmao1F = tax_glom(PSOtmao1, "Family", NArm = TRUE)
PSOtmao1F.top10 <- microbiome::aggregate_top_taxa(PSOtmao1F, "Family", top = 10)
## Warning: 'microbiome::aggregate_top_taxa' is deprecated.
## Use 'aggregate_rare' instead.
## See help("Deprecated") and help("The microbiome::aggregate_top_taxa function is deprecated.-deprecated").
ps1F.10.family.comp <- microbiome::transform(PSOtmao1F.top10, transform="compositional") #compositional

plot.composition.relAbun <- microbiome::plot_composition(ps1F.10.family.comp,
                                             sample.sort = "Description",
                                             x.label = "tmao_tertile",
                                             group_by = "tmao_tertile",
                                             average_by = "tmao_tertile") +
  theme_minimal() +
  upstartr::scale_y_percent() + 
  scale_fill_brewer("Family", palette = "Paired") + 
  labs(x= "TMAO Tertiles", y = "Relative Abundance (%)", 
       title = "Relative Abundance of Top 10 Prevalent Families")

plot.composition.relAbun

# Save
tiff("../../Outputs/plots/microbiome/RelAbund_Fam_TMAOTertiles_2111.tiff", width = 4, height = 4, units = 'in', res = 300)
plot.composition.relAbun
dev.off()
## quartz_off_screen 
##                 2

DESeq2

From the DESeq2 documentation, “Note: In order to benefit from the default settings of the package, you should put the variable of interest at the end of the formula and make sure the control level is the first level.”

From Dr. Kable’s example,

DESeq2 Set Up Test the impact of fiber cluster, accounting for covariates, using the log ratio test

dds.ratio.batcho = phyloseq_to_deseq2(ps.genus, ~ Batch + k4_cluster) This performs a likelihood ratio test, which determines if the increased likelihood of the data in the full model, where we include all terms, is more than expected if the terms excluded in the reduced model are really zero.

dds.ratio.batcho <- DESeq(dds.ratio.batcho, test=“LRT”, reduced = ~ Batch )

The p-value is based on the LRT (so the same for everything), but the contrasts give the right log2 fold change.

resultsNames(dds.ratio.batcho) res.1v2.batcho <- results(dds.ratio.batcho, contrast=c(“k4_cluster”, “2”, “1”))

Factored by TMAO Median

#~~~~~~~~~~~~~~~
# DESeq analysis
# TMAO median
#~~~~~~~~~~~~~~~
mb <- phyloseq_to_deseq2(PSOtmao1G, ~ sex + age + bmi_final.x + tmao_mdn)
## converting counts to integer mode
##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function
##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.
mb.ratio <- DESeq(mb, test = "LRT", reduced = ~ sex + age + bmi_final.x) # remove variable of interest here, TMAO
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
res.ratio <- results(mb.ratio)
alpha = 0.05
alpha.big = 0.5
sigtab = res.ratio[which(res.ratio$padj < alpha), ]
sigtab = cbind(as(sigtab, "data.frame"), as(tax_table(PSOtmao1G)[rownames(sigtab), ], "matrix"))
sigtab$family_genus <- paste0(sigtab$Family, sigtab$Genus)
head(sigtab)
##                                     baseMean log2FoldChange     lfcSE     stat
## c5859b7f5d14b8c145fbc3ad583c70e8    8.111452     -0.3746158 0.5523605 443.8409
## 5c4ca852b40641b3eb0ad23e69bb6583 1366.620472     -0.5074813 0.1523118  11.3455
##                                        pvalue         padj     Kingdom
## c5859b7f5d14b8c145fbc3ad583c70e8 1.579538e-98 8.213598e-97 k__Bacteria
## 5c4ca852b40641b3eb0ad23e69bb6583 7.563101e-04 1.966406e-02 k__Bacteria
##                                          Phylum               Class
## c5859b7f5d14b8c145fbc3ad583c70e8  p__Firmicutes  c__Erysipelotrichi
## 5c4ca852b40641b3eb0ad23e69bb6583  p__Firmicutes       c__Clostridia
##                                                   Order                  Family
## c5859b7f5d14b8c145fbc3ad583c70e8  o__Erysipelotrichales  f__Erysipelotrichaceae
## 5c4ca852b40641b3eb0ad23e69bb6583       o__Clostridiales      f__Lachnospiraceae
##                                              Genus Species
## c5859b7f5d14b8c145fbc3ad583c70e8  g__Coprobacillus    <NA>
## 5c4ca852b40641b3eb0ad23e69bb6583      g__Roseburia    <NA>
##                                                              family_genus
## c5859b7f5d14b8c145fbc3ad583c70e8  f__Erysipelotrichaceae g__Coprobacillus
## 5c4ca852b40641b3eb0ad23e69bb6583          f__Lachnospiraceae g__Roseburia

results when we look at TMAO median and control for age, sex, and BMI.

Factored by TMAO Tertile

# DESeq analysis - tmao_tertile
mb <- phyloseq_to_deseq2(PSOtmao1G, ~ sex + age + bmi_final.x + tmao_tertile)
## converting counts to integer mode
##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function
##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.
mb.ratio <- DESeq(mb, test = "LRT", reduced = ~ sex + age + bmi_final.x) # remove variable of interest here, TMAO
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## 1 rows did not converge in beta, labelled in mcols(object)$fullBetaConv. Use larger maxit argument with nbinomLRT
res.ratio <- results(mb.ratio, contrast = c("tmao_tertile", "Tertile1", "Tertile3")) # Look at most different tertiles
alpha = 0.05
sigtab = res.ratio[which(res.ratio$padj < alpha), ]
sigtab = cbind(as(sigtab, "data.frame"), as(tax_table(PSOtmao1G)[rownames(sigtab), ], "matrix"))
sigtab$family_genus <- paste0(sigtab$Family, sigtab$Genus)
head(sigtab)
##                                     baseMean log2FoldChange     lfcSE      stat
## a0798db0c868e9ad6089502956d68d87  386.824697     0.01846139 3.6502531  12.62628
## c5859b7f5d14b8c145fbc3ad583c70e8    8.111452     1.20626399 0.6754924 444.00883
## 5c4ca852b40641b3eb0ad23e69bb6583 1366.620472     0.72857487 0.1846848  16.75662
## 8f5ceded1cd7e86b8271d3fab24d322a   16.450103     4.57944146 2.2594975  49.63895
##                                        pvalue         padj     Kingdom
## a0798db0c868e9ad6089502956d68d87 1.812332e-03 2.356032e-02 k__Bacteria
## c5859b7f5d14b8c145fbc3ad583c70e8 3.843330e-97 1.998532e-95 k__Bacteria
## 5c4ca852b40641b3eb0ad23e69bb6583 2.297978e-04 3.983162e-03 k__Bacteria
## 8f5ceded1cd7e86b8271d3fab24d322a 1.663564e-11 4.325266e-10 k__Bacteria
##                                          Phylum               Class
## a0798db0c868e9ad6089502956d68d87  p__Firmicutes  c__Erysipelotrichi
## c5859b7f5d14b8c145fbc3ad583c70e8  p__Firmicutes  c__Erysipelotrichi
## 5c4ca852b40641b3eb0ad23e69bb6583  p__Firmicutes       c__Clostridia
## 8f5ceded1cd7e86b8271d3fab24d322a  p__Firmicutes       c__Clostridia
##                                                   Order                  Family
## a0798db0c868e9ad6089502956d68d87  o__Erysipelotrichales  f__Erysipelotrichaceae
## c5859b7f5d14b8c145fbc3ad583c70e8  o__Erysipelotrichales  f__Erysipelotrichaceae
## 5c4ca852b40641b3eb0ad23e69bb6583       o__Clostridiales      f__Lachnospiraceae
## 8f5ceded1cd7e86b8271d3fab24d322a       o__Clostridiales      f__Lachnospiraceae
##                                                Genus Species
## a0798db0c868e9ad6089502956d68d87  g__Catenibacterium    <NA>
## c5859b7f5d14b8c145fbc3ad583c70e8    g__Coprobacillus    <NA>
## 5c4ca852b40641b3eb0ad23e69bb6583        g__Roseburia    <NA>
## 8f5ceded1cd7e86b8271d3fab24d322a     g__Butyrivibrio    <NA>
##                                                                family_genus
## a0798db0c868e9ad6089502956d68d87  f__Erysipelotrichaceae g__Catenibacterium
## c5859b7f5d14b8c145fbc3ad583c70e8    f__Erysipelotrichaceae g__Coprobacillus
## 5c4ca852b40641b3eb0ad23e69bb6583            f__Lachnospiraceae g__Roseburia
## 8f5ceded1cd7e86b8271d3fab24d322a         f__Lachnospiraceae g__Butyrivibrio
#~~~~~~~~~~~~~~~
# Plot Results
#~~~~~~~~~~~~~~~
resOTU <- rownames(sigtab)
otuData <- as.data.frame(otu_table(PSOtmao1G))
otuData_desRes <- otuData[rownames(otuData) %in% resOTU,]
#otuData_desRes <- as.data.frame(t(otuData_desRes))
# Get sample data groupings
sampleData <- as.data.frame(sample_data(PSOtmao1G))
sampleData <- subset(sampleData, select = c(tmao_mdn, tmao_quantile, tmao_tertile, tmao))
# Get tax table
taxaData <- as.data.frame(tax_table(PSOtmao1G))
taxaData_res <- taxaData[rownames(taxaData) %in% resOTU,]
taxaData_res$OrderFamilyGenus <- paste0(taxaData_res$Order, taxaData_res$Family, taxaData_res$Genus)

# Merge
ggData <- merge(taxaData_res, otuData_desRes, by = 0)
rownames(ggData) <-  ggData$OrderFamilyGenus
ggData <- ggData[,-(1:9)]
ggData <- as.data.frame(t(ggData))
ggData <- merge(ggData, sampleData, by = 0)
rownames(ggData) <- ggData[,"Row.names"]
ggData <- subset(ggData, select = -c(Row.names))

# Loop Plot
nres <- length(resOTU)
for (i in 1:nres) { 
    print(ggplot(ggData, aes(x = tmao_tertile, y = log(ggData[,i]))) +
            geom_boxplot()+
            ylab(colnames(ggData)[i])) 
}
## Warning: Removed 3 rows containing non-finite values (stat_boxplot).

## Warning: Removed 336 rows containing non-finite values (stat_boxplot).

## Warning: Removed 304 rows containing non-finite values (stat_boxplot).

## Warning: Removed 246 rows containing non-finite values (stat_boxplot).

#~~~~~~~~~~~~~~~
# Plot Results Fold Change
#~~~~~~~~~~~~~~~
sigtabgen = subset(sigtab, !is.na(Genus))
# Phylum order
x = tapply(sigtabgen$log2FoldChange, sigtabgen$Phylum, function(x) max(x))
x = sort(x, TRUE)
sigtabgen$Phylum = factor(as.character(sigtabgen$Phylum), levels=names(x))
# Genus order
x = tapply(sigtabgen$log2FoldChange, sigtabgen$Genus, function(x) max(x))
x = sort(x, TRUE)
sigtabgen$Genus = factor(as.character(sigtabgen$Genus), levels=names(x))
DESeqPlot <- ggplot(sigtabgen, aes(y=family_genus, x=log2FoldChange, color=Phylum)) + 
  geom_vline(xintercept = 0.0, color = "gray", size = 0.5) +
  geom_point(size=3) + 
  ylab("Taxonomy") +
  xlab("Log2 Fold Change") + 
  scale_color_brewer(palette = "Paired") +
  theme_minimal() +
  theme(axis.text = element_text(size = 12), # x and y axis text size
        axis.title = element_text(size = 12), # x and y label text size
        legend.text = element_text(size = 12), # legend text size
        legend.title = element_text(size = 12), # label legend text size
        axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5))

DESeqPlot

# Save
tiff("../../Outputs/plots/microbiome/DESeq2_Genus_TMAOTertile1v3_2201.tiff", width = 8, height = 3, units = 'in', res = 300)
DESeqPlot
dev.off()
## quartz_off_screen 
##                 2

Save

#write.csv(sigtabgen, "../../Outputs/tables/Microbiome/DESeq2_PSOtmao1G_TMAOTertile_2112.csv")
write.csv(sigtabgen, "../../Outputs/tables/Microbiome/DESeq2_PSOtmao1G_TMAOTertile_220119.csv")

Factored by TMAO Quantile

# DESeq analysis - tmao_quantile
mb <- phyloseq_to_deseq2(PSOtmao1G, ~ sex + age + bmi_final.x + tmao_quantile)
## converting counts to integer mode
##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function
##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.
mb.ratio <- DESeq(mb, test = "LRT", reduced = ~ sex + age + bmi_final.x) # remove variable of interest here, TMAO
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
res.ratio <- results(mb.ratio, contrast = c("tmao_quantile", "Quantile1", "Quantile4"))
alpha = 0.05
sigtab = res.ratio[which(res.ratio$padj < alpha), ]
sigtab = cbind(as(sigtab, "data.frame"), as(tax_table(PSOtmao1G)[rownames(sigtab), ], "matrix"))
sigtab$family_genus <- paste0(sigtab$Family, sigtab$Genus)
head(sigtab)
##                                     baseMean log2FoldChange     lfcSE
## 7363aac88b5325ac49cb469e284e10dd   14.227849      2.3978093 0.7816833
## c5859b7f5d14b8c145fbc3ad583c70e8    8.111452      0.8333416 0.7782711
## 5c4ca852b40641b3eb0ad23e69bb6583 1366.620472      0.8328632 0.2134797
## 7a297761643e32391ea7cd03b0995e62   45.010053     -0.6821178 0.6967944
##                                        stat       pvalue         padj
## 7363aac88b5325ac49cb469e284e10dd   15.08675 1.743996e-03 2.267195e-02
## c5859b7f5d14b8c145fbc3ad583c70e8  446.95386 1.490189e-96 3.874492e-95
## 5c4ca852b40641b3eb0ad23e69bb6583   16.73660 8.005937e-04 1.387696e-02
## 7a297761643e32391ea7cd03b0995e62 1576.19463 0.000000e+00 0.000000e+00
##                                      Kingdom         Phylum               Class
## 7363aac88b5325ac49cb469e284e10dd k__Bacteria  p__Firmicutes          c__Bacilli
## c5859b7f5d14b8c145fbc3ad583c70e8 k__Bacteria  p__Firmicutes  c__Erysipelotrichi
## 5c4ca852b40641b3eb0ad23e69bb6583 k__Bacteria  p__Firmicutes       c__Clostridia
## 7a297761643e32391ea7cd03b0995e62 k__Bacteria  p__Firmicutes       c__Clostridia
##                                                   Order                  Family
## 7363aac88b5325ac49cb469e284e10dd     o__Lactobacillales     f__Streptococcaceae
## c5859b7f5d14b8c145fbc3ad583c70e8  o__Erysipelotrichales  f__Erysipelotrichaceae
## 5c4ca852b40641b3eb0ad23e69bb6583       o__Clostridiales      f__Lachnospiraceae
## 7a297761643e32391ea7cd03b0995e62       o__Clostridiales      f__Lachnospiraceae
##                                              Genus Species
## 7363aac88b5325ac49cb469e284e10dd    g__Lactococcus    <NA>
## c5859b7f5d14b8c145fbc3ad583c70e8  g__Coprobacillus    <NA>
## 5c4ca852b40641b3eb0ad23e69bb6583      g__Roseburia    <NA>
## 7a297761643e32391ea7cd03b0995e62   g__Ruminococcus    <NA>
##                                                              family_genus
## 7363aac88b5325ac49cb469e284e10dd       f__Streptococcaceae g__Lactococcus
## c5859b7f5d14b8c145fbc3ad583c70e8  f__Erysipelotrichaceae g__Coprobacillus
## 5c4ca852b40641b3eb0ad23e69bb6583          f__Lachnospiraceae g__Roseburia
## 7a297761643e32391ea7cd03b0995e62       f__Lachnospiraceae g__Ruminococcus
#~~~~~~~~~~~~~~~
# Plot Results
#~~~~~~~~~~~~~~~
resOTU <- rownames(sigtab)
otuData <- as.data.frame(otu_table(PSOtmao1G))
otuData_desRes <- otuData[rownames(otuData) %in% resOTU,]
#otuData_desRes <- as.data.frame(t(otuData_desRes))
# Get sample data groupings
sampleData <- as.data.frame(sample_data(PSOtmao1G))
sampleData <- subset(sampleData, select = c(tmao_mdn, tmao_quantile, tmao_tertile, tmao))
# Get tax table
taxaData <- as.data.frame(tax_table(PSOtmao1G))
taxaData_res <- taxaData[rownames(taxaData) %in% resOTU,]
taxaData_res$OrderFamilyGenus <- paste0(taxaData_res$Order, taxaData_res$Family, taxaData_res$Genus)

# Merge
ggData <- merge(taxaData_res, otuData_desRes, by = 0)
rownames(ggData) <-  ggData$OrderFamilyGenus
ggData <- ggData[,-(1:9)]
ggData <- as.data.frame(t(ggData))
ggData <- merge(ggData, sampleData, by = 0)
rownames(ggData) <- ggData[,"Row.names"]
ggData <- subset(ggData, select = -c(Row.names))

# Loop Plot
nres <- length(resOTU)
for (i in 1:nres) { 
    print(ggplot(ggData, aes(x = tmao_quantile, y = log(ggData[,i]))) +
            geom_boxplot()+
            ylab(colnames(ggData)[i])) 
}
## Warning: Removed 3 rows containing non-finite values (stat_boxplot).

## Warning: Removed 242 rows containing non-finite values (stat_boxplot).

## Warning: Removed 202 rows containing non-finite values (stat_boxplot).

## Warning: Removed 246 rows containing non-finite values (stat_boxplot).

#~~~~~~~~~~~~~~~
# Plot Results Fold Change
#~~~~~~~~~~~~~~~
sigtabgen = subset(sigtab, !is.na(Genus))
# Phylum order
x = tapply(sigtabgen$log2FoldChange, sigtabgen$Phylum, function(x) max(x))
x = sort(x, TRUE)
sigtabgen$Phylum = factor(as.character(sigtabgen$Phylum), levels=names(x))
# Genus order
x = tapply(sigtabgen$log2FoldChange, sigtabgen$Genus, function(x) max(x))
x = sort(x, TRUE)
sigtabgen$Genus = factor(as.character(sigtabgen$Genus), levels=names(x))
ggplot(sigtabgen, aes(y=family_genus, x=log2FoldChange, color=Phylum)) + 
  geom_vline(xintercept = 0.0, color = "gray", size = 0.5) +
  geom_point(size=3) + 
  ylab("Taxonomy") +
  xlab("Log2 Fold Change") + 
  scale_color_brewer(palette = "Dark2") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5))

Alpha Diversity Analysis

We do not need to redo the alpha diversity calculations because the filtering we did was not to remove technical variation but instead to focus our analysis on the more common players (also so we can be more statistically confident using less scarce data). Note, if you did want to calculate alpha diversity, the command estimate_richness() can accomplish that for you.

Check the distributions of the alpha diversity metrics and run the multiple linear regression models with covariates.

# Load master pheno file
phen <- readRDS("../../data/processed/phenotype/Phenotypes_211108.rds")
phen <- subset(phen, select = -c(sex, age, age_cat, bmi_cat))

# Grab PSO sample data
df <- as(sample_data(PSOtmao1G), "data.frame")
# Merge
df2 <- merge(phen, df, by = 0)
row.names(df2) <- df2[,"Row.names"]
df2 <- subset(df2, select =-c(Row.names))

#~~~~~~~~~~~~~~~~
# Compute
#~~~~~~~~~~~~~~~~

# Check distribution
shapiro.test(df2$shannon) # not normal, proceed with caution
## 
##  Shapiro-Wilk normality test
## 
## data:  df2$shannon
## W = 0.905, p-value = 4.423e-14
shapiro.test(log(df2$shannon))
## 
##  Shapiro-Wilk normality test
## 
## data:  log(df2$shannon)
## W = 0.84813, p-value < 2.2e-16
# Run linear regression
summary(lm(tmao_log ~ shannon + sex*age + cystatinc_bd1, df2))
## 
## Call:
## lm(formula = tmao_log ~ shannon + sex * age + cystatinc_bd1, 
##     data = df2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.56039 -0.30824 -0.01383  0.28203  1.82041 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -0.307376   0.302115  -1.017 0.309666    
## shannon        0.172077   0.044785   3.842 0.000145 ***
## sex2          -0.431212   0.163883  -2.631 0.008888 ** 
## age           -0.001198   0.002854  -0.420 0.674953    
## cystatinc_bd1  0.695472   0.194619   3.573 0.000402 ***
## sex2:age       0.009960   0.003812   2.613 0.009374 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4914 on 347 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.1311, Adjusted R-squared:  0.1185 
## F-statistic: 10.47 on 5 and 347 DF,  p-value: 2.264e-09
# Check distribution
shapiro.test(df2$faith_pd) # normal
## 
##  Shapiro-Wilk normality test
## 
## data:  df2$faith_pd
## W = 0.99699, p-value = 0.7601
# Run linear regression
summary(lm(tmao_log ~ faith_pd + sex*age + cystatinc_bd1, df2))
## 
## Call:
## lm(formula = tmao_log ~ faith_pd + sex * age + cystatinc_bd1, 
##     data = df2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.67973 -0.30954 -0.02643  0.29712  1.79212 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    0.1109111  0.2293375   0.484 0.628964    
## faith_pd       0.0653056  0.0170657   3.827 0.000154 ***
## sex2          -0.3998769  0.1636350  -2.444 0.015035 *  
## age           -0.0009222  0.0028409  -0.325 0.745677    
## cystatinc_bd1  0.6827331  0.1946261   3.508 0.000511 ***
## sex2:age       0.0097816  0.0038107   2.567 0.010680 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4915 on 347 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.1308, Adjusted R-squared:  0.1183 
## F-statistic: 10.44 on 5 and 347 DF,  p-value: 2.39e-09
# Check distribution
shapiro.test(df2$pielou_e) # not normal, proceed with caution
## 
##  Shapiro-Wilk normality test
## 
## data:  df2$pielou_e
## W = 0.81544, p-value < 2.2e-16
# Run linear regression
summary(lm(tmao_log ~ pielou_e + sex*age + cystatinc_bd1, df2))
## 
## Call:
## lm(formula = tmao_log ~ pielou_e + sex * age + cystatinc_bd1, 
##     data = df2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.59096 -0.29261 -0.00173  0.28921  1.87433 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -0.2456861  0.3759449  -0.654 0.513856    
## pielou_e       1.1175333  0.4316197   2.589 0.010026 *  
## sex2          -0.4311371  0.1660659  -2.596 0.009827 ** 
## age           -0.0001399  0.0028627  -0.049 0.961054    
## cystatinc_bd1  0.6877669  0.1968057   3.495 0.000536 ***
## sex2:age       0.0097826  0.0038562   2.537 0.011623 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.497 on 347 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.1113, Adjusted R-squared:  0.09846 
## F-statistic: 8.689 on 5 and 347 DF,  p-value: 8.927e-08
# Check distribution
shapiro.test(df2$observed_otus) # normal
## 
##  Shapiro-Wilk normality test
## 
## data:  df2$observed_otus
## W = 0.99609, p-value = 0.5367
# Run linear regression
summary(lm(tmao_log ~ observed_otus + sex*age + cystatinc_bd1, df2))
## 
## Call:
## lm(formula = tmao_log ~ observed_otus + sex * age + cystatinc_bd1, 
##     data = df2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.71047 -0.30903 -0.02197  0.28130  1.83166 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    0.1663387  0.2189965   0.760 0.448039    
## observed_otus  0.0033524  0.0008354   4.013 7.35e-05 ***
## sex2          -0.3924683  0.1632997  -2.403 0.016769 *  
## age           -0.0011621  0.0028421  -0.409 0.682870    
## cystatinc_bd1  0.6889571  0.1942392   3.547 0.000443 ***
## sex2:age       0.0096736  0.0038019   2.544 0.011380 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4905 on 347 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.1343, Adjusted R-squared:  0.1218 
## F-statistic: 10.76 on 5 and 347 DF,  p-value: 1.233e-09

Alpha Diversity Plots

Make plots that demonstrate the relationship between TMAO and the alpha-diversity metrics.

# Shannon
y_variable <- "Shannon"
x_variable <- "Natural Log of TMAO (uM)"
mdn_cut <- log(median(df2$tmao.x))
# Plot
shannon_plot <- ggplot(df2, aes(x=tmao_log, y=shannon)) +
  geom_point(aes(shape=bmi_cat, color = age_cat)) +
  geom_vline(xintercept = mdn_cut, linetype = "dashed") +
  geom_smooth(method = "lm", se = FALSE, fullrange = F) +
  theme_minimal() + 
  ylab(y_variable) +
  xlab(x_variable) +
  scale_shape_discrete(name="BMI Bin (kg/m^2)", labels=c("18.5-24", "25-29", "30>")) +
  scale_color_brewer(palette = "Dark2",name="Age Bin (yr)", labels=c("18-33", "34-49", "50-65")) +
  theme(axis.text = element_text(size = 12), # x and y axis text size
        axis.title = element_text(size = 12), # x and y label text size
        legend.text = element_text(size = 12), # legend text size
        legend.title = element_text(size = 12), # label legend text size
        axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5))
shannon_plot
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).

# Faith's Phylogenetic Diversity
y_variable <- "Faith's Phylogenetic Diversity"
x_variable <- "Natural Log of TMAO (uM)"
mdn_cut <- log(median(df2$tmao.x))
# Plot
faith_plot <- ggplot(df2, aes(x=tmao_log, y=faith_pd)) +
  geom_point(aes(shape=bmi_cat, color = age_cat)) +
  geom_vline(xintercept = mdn_cut, linetype = "dashed") +
  geom_smooth(method = "lm", se = FALSE, fullrange = F) +
  theme_minimal() + 
  ylab(y_variable) +
  xlab(x_variable) +
  labs(color = "Age bin", shape = "BMI bin") +
  scale_color_brewer(palette = "Dark2") +
  theme(axis.text = element_text(size = 12), # x and y axis text size
        axis.title = element_text(size = 12), # x and y label text size
        legend.text = element_text(size = 12), # legend text size
        legend.title = element_text(size = 12), # label legend text size
        axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5))
faith_plot
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).

## Warning: Removed 2 rows containing missing values (geom_point).

# Pielou's Evenness
y_variable <- "Pielou's Evenness"
x_variable <- "Natural Log of TMAO (uM)"
mdn_cut <- log(median(df2$tmao.x))
# Plot
pielou_plot <- ggplot(df2, aes(x=tmao_log, y=pielou_e)) +
  geom_point(aes(shape=bmi_cat, color = age_cat)) +
  geom_vline(xintercept = mdn_cut, linetype = "dashed") +
  geom_smooth(method = "lm", se = FALSE, fullrange = F) +
  theme_minimal() + 
  ylab(y_variable) +
  xlab(x_variable) +
  labs(color = "Age bin", shape = "BMI bin") +
  scale_color_brewer(palette = "Dark2") +
  theme(axis.text = element_text(size = 12), # x and y axis text size
        axis.title = element_text(size = 12), # x and y label text size
        legend.text = element_text(size = 12), # legend text size
        legend.title = element_text(size = 12), # label legend text size
        axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5))
pielou_plot
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).

## Warning: Removed 2 rows containing missing values (geom_point).

# Observed OTUs
y_variable <- "Observed OTUs"
x_variable <- "Natural Log of TMAO (uM)"
mdn_cut <- log(median(df2$tmao.x))
# Plot
obsOTU_plot <- ggplot(df2, aes(x=tmao_log, y=observed_otus)) +
  geom_point(aes(shape=bmi_cat, color = age_cat)) +
  geom_vline(xintercept = mdn_cut, linetype = "dashed") +
  geom_smooth(method = "lm", se = FALSE, fullrange = F) +
  theme_minimal() + 
  ylab(y_variable) +
  xlab(x_variable) +
  labs(color = "Age bin", shape = "BMI bin") +
  scale_color_brewer(palette = "Dark2") +
  theme(axis.text = element_text(size = 12), # x and y axis text size
        axis.title = element_text(size = 12), # x and y label text size
        legend.text = element_text(size = 12), # legend text size
        legend.title = element_text(size = 12), # label legend text size
        axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5))
obsOTU_plot
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).

## Warning: Removed 2 rows containing missing values (geom_point).

Save

# Save
tiff("../../Outputs/plots/microbiome/AlphaDiversity_Shannon_TMAO_2112.tiff", width = 6, height = 4, units = 'in', res = 300)
shannon_plot
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
dev.off()
## quartz_off_screen 
##                 2

Beta Diversity Analysis

# PCoA
PCoA_WU <- ordinate(PSOtmao1G, "PCoA", "wunifrac")
PCoA_UnWU <- ordinate(PSOtmao1G, "PCoA", "uunifrac")
PCoA_BC <- ordinate(PSOtmao1G, "PCoA", "bray")

# Calculate distance matrix
dismax_wu = phyloseq::distance(PSOtmao1G, method="wunifrac", type = "samples")
dismax_uwu = phyloseq::distance(PSOtmao1G, method="unifrac", type = "samples")
dismax_bc = phyloseq::distance(PSOtmao1G, method="bray", type = "samples")

Save

# Outdated
#save(PCoA_WU, file = "../../data/processed/microbiome/diversity/PSOtmao1G_PCoA_WU.RData")
#save(PCoA_UnWU, file = "../../data/processed/microbiome/diversity/PSOtmao1G_PCoA_UnWU.RData")
#save(PCoA_BC, file = "../../data/processed/microbiome/diversity/PSOtmao1G_PCoA_BC.RData")

#save(dismax_wu, file = "../../data/processed/microbiome/diversity/PSOtmao1G_dismax_WU.RData")
#save(dismax_uwu, file = "../../data/processed/microbiome/diversity/PSOtmao1G_dismax_UWU.RData")
#save(dismax_bc, file = "../../data/processed/microbiome/diversity/PSOtmao1G_dismax_BC.RData")

# With updated taxonomy table and from PSOtmaoNA
save(PCoA_WU, file = "../../data/processed/microbiome/diversity/PSOtmao1G_PCoA_WU_220119.RData")
save(PCoA_UnWU, file = "../../data/processed/microbiome/diversity/PSOtmao1G_PCoA_UnWU_220119.RData")
save(PCoA_BC, file = "../../data/processed/microbiome/diversity/PSOtmao1G_PCoA_BC_220119.RData")

save(dismax_wu, file = "../../data/processed/microbiome/diversity/PSOtmao1G_dismax_WU_220119.RData")
save(dismax_uwu, file = "../../data/processed/microbiome/diversity/PSOtmao1G_dismax_UWU_220119.RData")
save(dismax_bc, file = "../../data/processed/microbiome/diversity/PSOtmao1G_dismax_BC_220119.RData")

Load

load(file = "../../data/processed/microbiome/diversity/PSOtmao1G_PCoA_WU_220119.RData")
load(file = "../../data/processed/microbiome/diversity/PSOtmao1G_PCoA_UnWU_220119.RData")
load(file = "../../data/processed/microbiome/diversity/PSOtmao1G_PCoA_BC_220119.RData")

load(file = "../../data/processed/microbiome/diversity/PSOtmao1G_dismax_WU_220119.RData")
load(file = "../../data/processed/microbiome/diversity/PSOtmao1G_dismax_UWU_220119.RData")
load(file = "../../data/processed/microbiome/diversity/PSOtmao1G_dismax_BC_220119.RData")
# Get sample_data
df <- as(sample_data(PSOtmao1G), "data.frame")
df$tmao_mdn <- relevel(df$tmao_mdn, ref = "below")

#~~~~~~~~~~~
# Weighted
#~~~~~~~~~~~
# Beta dispersion
bdis <- betadisper(dismax_wu, df$tmao_mdn)
permutest(bdis) # NS, good
## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
## 
## Response: Distances
##            Df Sum Sq  Mean Sq      F N.Perm Pr(>F)  
## Groups      1 0.0383 0.038279 2.8245    999  0.094 .
## Residuals 353 4.7840 0.013552                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
boxplot(bdis, las = 2)

# ADONIS (perMANOVA) analysis
adonis2(dismax_wu ~ tmao_log + sex + age + bmi_final.y, data=df, method = "wunifrac", perm=999) # .076
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = dismax_wu ~ tmao_log + sex + age + bmi_final.y, data = df, permutations = 999, method = "wunifrac")
##              Df SumOfSqs      R2      F Pr(>F)    
## tmao_log      1   0.1186 0.00688 2.4987  0.061 .  
## sex           1   0.3836 0.02225 8.0818  0.001 ***
## age           1   0.0892 0.00518 1.8795  0.110    
## bmi_final.y   1   0.0332 0.00192 0.6984  0.557    
## Residual    350  16.6144 0.96377                  
## Total       354  17.2391 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#adonis2(tmao_log ~ dismax_wu + sex*age + cystatinc_bd1, data=df, method = "wunifrac", perm=999) 

#~~~~~~~~~~~
# Unweighted
#~~~~~~~~~~~
# Beta dispersion
bdis <- betadisper(dismax_uwu, df$tmao_mdn)
permutest(bdis) # NS, good
## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
## 
## Response: Distances
##            Df Sum Sq   Mean Sq      F N.Perm Pr(>F)
## Groups      1 0.0037 0.0037027 1.2326    999  0.272
## Residuals 353 1.0604 0.0030039
boxplot(bdis, las = 2)

# ADONIS (perMANOVA) analysis
adonis2(dismax_uwu ~ tmao_log + sex + age + bmi_final.y, data=df, method = "unifrac", perm=999) # .001
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = dismax_uwu ~ tmao_log + sex + age + bmi_final.y, data = df, permutations = 999, method = "unifrac")
##              Df SumOfSqs      R2      F Pr(>F)    
## tmao_log      1   0.1789 0.01153 4.1722  0.001 ***
## sex           1   0.1828 0.01177 4.2611  0.001 ***
## age           1   0.0655 0.00422 1.5281  0.153    
## bmi_final.y   1   0.0828 0.00533 1.9299  0.053 .  
## Residual    350  15.0113 0.96714                  
## Total       354  15.5213 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#adonis2(tmao_log ~ dismax_uwu + sex*age + cystatinc_bd1, data=df, method = "unifrac", perm=999) 

#~~~~~~~~~~~
# BrayCurtis
#~~~~~~~~~~~
# Beta dispersion
bdis <- betadisper(dismax_bc, df$tmao_mdn)
permutest(bdis) # .04
## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
## 
## Response: Distances
##            Df Sum Sq  Mean Sq      F N.Perm Pr(>F)  
## Groups      1 0.0421 0.042097 3.6508    999  0.076 .
## Residuals 353 4.0705 0.011531                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
boxplot(bdis, las = 2)

# ADONIS (perMANOVA) analysis
adonis2(dismax_bc ~ tmao_log + sex + age + bmi_final.y, data=df, method = "bray", perm=999) # .017
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = dismax_bc ~ tmao_log + sex + age + bmi_final.y, data = df, permutations = 999, method = "bray")
##              Df SumOfSqs      R2      F Pr(>F)    
## tmao_log      1    0.295 0.00678 2.4649  0.017 *  
## sex           1    0.727 0.01675 6.0856  0.001 ***
## age           1    0.383 0.00883 3.2067  0.002 ** 
## bmi_final.y   1    0.184 0.00423 1.5371  0.103    
## Residual    350   41.822 0.96341                  
## Total       354   43.411 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#adonis2(tmao_log ~ dismax_bc + sex*age + cystatinc_bd1, data=df, method = "bray", perm=999) 

Beta Diversity Plots

# Weighted
betaplot_WU <- plot_ordination(PSOtmao1G, PCoA_WU, 
                    color="tmao_mdn",
                    shape = NULL,
                    axes = 1:2) +
  scale_color_brewer(palette = "Dark2", labels = c("Below", "Above")) +
  stat_ellipse(linetype = 1) +
  labs(color = "TMAO Median Threshold") +
  theme_minimal() + 
  theme(axis.text = element_text(size = 12), # x and y axis text size
        axis.title = element_text(size = 12), # x and y label text size
        legend.text = element_text(size = 12), # legend text size
        legend.title = element_text(size = 12), # label legend text size
        axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5))

betaplot_WU

# Unweighted
betaplot_UWU <- plot_ordination(PSOtmao1G, PCoA_UnWU, 
                    color="tmao_mdn",
                    shape = NULL,
                    axes = 1:2) +
  scale_color_brewer(palette = "Dark2", labels = c("Below", "Above")) +
  stat_ellipse(linetype = 1) +
  labs(color = "TMAO Median Threshold") +
  theme_minimal() + 
  theme(axis.text = element_text(size = 12), # x and y axis text size
        axis.title = element_text(size = 12), # x and y label text size
        legend.text = element_text(size = 12), # legend text size
        legend.title = element_text(size = 12), # label legend text size
        axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5))

betaplot_UWU

# Bray Curtis
betaplot_bc <- plot_ordination(PSOtmao1G, PCoA_BC, 
                    color="tmao_mdn",
                    shape = NULL,
                    axes = 1:2) +
  scale_color_brewer(palette = "Dark2", labels = c("Below", "Above")) +
  stat_ellipse(linetype = 1) +
  labs(color = "TMAO Median Threshold") +
  theme_minimal() + 
  theme(axis.text = element_text(size = 12), # x and y axis text size
        axis.title = element_text(size = 12), # x and y label text size
        legend.text = element_text(size = 12), # legend text size
        legend.title = element_text(size = 12), # label legend text size
        axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5))

betaplot_bc

Save

# Save
#tiff("../../Outputs/plots/microbiome/BetaDiversity_PCoAWU_TMAOmdn_2112.tiff", width = 6, height = 4, units = 'in', res = 300)
tiff("../../Outputs/plots/microbiome/BetaDiversity_PCoAWU_TMAOmdn_2201.tiff", width = 6, height = 4, units = 'in', res = 300)
betaplot_WU
dev.off()
## quartz_off_screen 
##                 2
# unweighted
#tiff("../../Outputs/plots/microbiome/BetaDiversity_PCoAWU_TMAOmdn_2112.tiff", width = 6, height = 4, units = 'in', res = 300)
tiff("../../Outputs/plots/microbiome/BetaDiversity_PCoAWU_TMAOmdn_2209.tiff", width = 6, height = 4, units = 'in', res = 300)
betaplot_WU
dev.off()
## quartz_off_screen 
##                 2

Play around with the coloring to try to figure out what is discriminating these groups!

colnames(sample_data(PSOtmao1G))
##  [1] "Description"                     "SampleID"                       
##  [3] "LinkerPrimerSequence"            "DNA_ng_ul"                      
##  [5] "DNA_isolation_date"              "Batch"                          
##  [7] "SampleType"                      "deblur_seq_depth"               
##  [9] "dada2_seq_depth"                 "bmi_final.x"                    
## [11] "ht_cm"                           "weightv2"                       
## [13] "screen_age"                      "screen_sex"                     
## [15] "faith_pd"                        "shannon"                        
## [17] "observed_otus"                   "pielou_e"                       
## [19] "pa_ee_total"                     "endo_ln_rhi"                    
## [21] "endo_al75"                       "betaine"                        
## [23] "carnitine"                       "tmao"                           
## [25] "choline"                         "creatinine"                     
## [27] "phosphocholine"                  "hei_asa24_totalveg"             
## [29] "hei_asa24_totalproteinfoods"     "hei_asa24_sodium"               
## [31] "hei_asa24_refinedgrains"         "hei_asa24_sfat"                 
## [33] "hei_asa24_addsug"                "hei_asa24_wholegrain"           
## [35] "hei_asa24_totalscore"            "tmao_log"                       
## [37] "pa_ee_total_mdn"                 "endo_ln_rhi_mdn"                
## [39] "endo_al75_mdn"                   "betaine_mdn"                    
## [41] "carnitine_mdn"                   "tmao_mdn"                       
## [43] "choline_mdn"                     "creatinine_mdn"                 
## [45] "phosphocholine_mdn"              "hei_asa24_totalveg_mdn"         
## [47] "hei_asa24_totalproteinfoods_mdn" "hei_asa24_sodium_mdn"           
## [49] "hei_asa24_refinedgrains_mdn"     "hei_asa24_sfat_mdn"             
## [51] "hei_asa24_addsug_mdn"            "hei_asa24_wholegrain_mdn"       
## [53] "hei_asa24_totalscore_mdn"        "tmao_log_mdn"                   
## [55] "TMAO_5"                          "TMAO_6"                         
## [57] "TMAO_7"                          "sex"                            
## [59] "age"                             "bmi_final.y"                    
## [61] "age_cat"                         "bmi_cat"                        
## [63] "tmao_quantile"                   "tmao_tertile"
loopPheno <- c("pa_ee_total_mdn","endo_ln_rhi_mdn", "endo_al75_mdn", "betaine_mdn", "carnitine_mdn", "tmao_mdn",  "choline_mdn", "creatinine_mdn", "phosphocholine_mdn", "hei_asa24_totalveg_mdn", "hei_asa24_totalproteinfoods_mdn", "hei_asa24_sodium_mdn", "hei_asa24_refinedgrains_mdn", "hei_asa24_sfat_mdn",  "hei_asa24_addsug_mdn", "hei_asa24_wholegrain_mdn", "hei_asa24_totalscore_mdn", "tmao_log_mdn", "TMAO_5", "TMAO_6", "TMAO_7", "sex", "age_cat", "bmi_cat", "tmao_quantile", "tmao_tertile")   

# PCoA_WU, PCoA_UnWU, PCoA_BC plug in to view
for(i in 1:length(loopPheno)){
print(plot_ordination(PSOtmao1G, PCoA_WU, 
                    color = loopPheno[i],
                    shape = NULL,
                    axes = 1:2) +
  #scale_color_brewer(palette = "Dark2", labels = c("Below", "Above")) +
  stat_ellipse(linetype = 1) +
  labs(color = loopPheno[i]) +
  theme_minimal() + 
  theme(axis.text = element_text(size = 12), # x and y axis text size
        axis.title = element_text(size = 12), # x and y label text size
        legend.text = element_text(size = 12), # legend text size
        legend.title = element_text(size = 12), # label legend text size
        axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5)))
}

# From the plots, Weighted UniFrac and endo_al75 and betaine look different
bdis <- betadisper(dismax_wu, df$betaine_mdn)
permutest(bdis) # NS, good
## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
## 
## Response: Distances
##            Df Sum Sq   Mean Sq     F N.Perm Pr(>F)
## Groups      1 0.0094 0.0093671 0.697    999  0.398
## Residuals 353 4.7438 0.0134385
boxplot(bdis, las = 2)

# ADONIS (perMANOVA) analysis
adonis2(dismax_wu ~ betaine + sex + age + bmi_final.y, data=df, method = "wunifrac", perm=999) # different
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = dismax_wu ~ betaine + sex + age + bmi_final.y, data = df, permutations = 999, method = "wunifrac")
##              Df SumOfSqs      R2      F Pr(>F)    
## betaine       1   0.3633 0.02108 7.7400  0.001 ***
## sex           1   0.2656 0.01541 5.6582  0.003 ** 
## age           1   0.1552 0.00900 3.3064  0.030 *  
## bmi_final.y   1   0.0253 0.00147 0.5397  0.675    
## Residual    350  16.4296 0.95304                  
## Total       354  17.2391 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANCOM

ANCOM2 no-covariate analysis

source("../../Rscripts/ancom_v2.1.R")
## 
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
## 
##     collapse
## The following object is masked from 'package:IRanges':
## 
##     collapse
## Welcome to compositions, a package for compositional data analysis.
## Find an intro with "? compositions"
## 
## Attaching package: 'compositions'
## The following objects are masked from 'package:IRanges':
## 
##     cor, cov, var
## The following objects are masked from 'package:S4Vectors':
## 
##     cor, cov, var
## The following objects are masked from 'package:BiocGenerics':
## 
##     normalize, var
## The following objects are masked from 'package:stats':
## 
##     anova, cor, cov, dist, var
## The following objects are masked from 'package:base':
## 
##     %*%, norm, scale, scale.default
# OTU table
MicroData <- otu_table(PSOtmao1G)
MicroData <- as.data.frame(as.matrix(MicroData), stringsAsFactors = F)
# Add 1
#MicroData2 <- MicroData +1
#MicroData2 <- as.data.frame(as.matrix(t(MicroData2),stringsAsFactors = F))
#dim(MicroData2) # 103 x 98

# Load the phenotype data
#------------------------
# Perform ANCOM analysis
# Step 1: Data preprocessing
# OTU data
otu_data <- otu_table(PSOtmao1G)
otu_data <- as.data.frame(as.matrix(otu_data), stringsAsFactors = F)
otu_id <- rownames(otu_data)

# Metadata 
#df <- readRDS("../data_processed/mapping/Biocrates_mapping_BD1_210809.rds")
#df$p.Cresol.SO4_bd1_mdn <- plyr::revalue(df$p.Cresol.SO4_bd1_mdn, c("0" = "< Median", "1" = "> Median"))
# Subset
MetaData2 <- df2 %>% dplyr::select(c(tmao_mdn, age, sex, bmi_final))
MetaData2$Sample.ID <- rownames(MetaData2)


feature_table = otu_data; sample_var = "Sample.ID"; group_var = NULL
out_cut = 0.05; zero_cut = 0.95; lib_cut = 1000; neg_lb = FALSE # make zero_cut=0.95 to match criteria used for other metabolites
# Function
prepro = feature_table_pre_process(feature_table = feature_table,
                                   meta_data = MetaData2,
                                   sample_var = sample_var,
                                   group_var = NULL,
                                   out_cut = out_cut,
                                   zero_cut = zero_cut,
                                   lib_cut = lib_cut,
                                   neg_lb = neg_lb)

# Step 2: ANCOM
feature_table = prepro$feature_table 
meta_data = prepro$meta_data
struc_zero = prepro$structure_zeros

# Define variables for ANCOM
main_var = "tmao_mdn"; p_adj_method = "BH"; alpha = 0.05
adj_formula = NULL ; rand_formula = NULL
# Run ANCCOM
t_start = Sys.time()
res = ANCOM(feature_table = feature_table, 
            meta_data = meta_data, 
            struc_zero = struc_zero, 
            main_var = main_var, 
            p_adj_method = p_adj_method, 
            alpha = alpha, 
            adj_formula = adj_formula, 
            rand_formula = rand_formula)
t_end = Sys.time()
t_run = t_end - t_start 
t_run # around 5s
## Time difference of 4.073804 secs
# Results
res$out
##                             taxa_id  W detected_0.9 detected_0.8 detected_0.7
## 1  f1d7ac8c18c1d8144a00e4d785c86e4e  1        FALSE        FALSE        FALSE
## 2  05404c9fdf9f3f334eb618bac3f434f6  0        FALSE        FALSE        FALSE
## 3  06105df60508c2ed24a54f1b8ed64e49  0        FALSE        FALSE        FALSE
## 4  bb1b75f41ff9c9db1d1de41e8388eb52  0        FALSE        FALSE        FALSE
## 5  1efe70e365249c0d2fc28580b6ba0529  0        FALSE        FALSE        FALSE
## 6  098c3bbd8234f4ac198297ac0bde957d  0        FALSE        FALSE        FALSE
## 7  db8f48a3fe2fca95fb4986f5507b9076  0        FALSE        FALSE        FALSE
## 8  3f5533292a655d0f54a64560d65e823b  0        FALSE        FALSE        FALSE
## 9  7017e1746ae742d69c50c1274cc2f02e  0        FALSE        FALSE        FALSE
## 10 803eb52cfe3d77bdf9fe14c011e425fb  0        FALSE        FALSE        FALSE
## 11 5be4e26b904b9d711eb829785568ff92  0        FALSE        FALSE        FALSE
## 12 a41766e0533ab64a9966f362a7938478  0        FALSE        FALSE        FALSE
## 13 4b11c7e76e289279c460920e677924c4  0        FALSE        FALSE        FALSE
## 14 5f0cbc930515c7ec606289b79d8c4ba3  0        FALSE        FALSE        FALSE
## 15 30bb271b2baa58b49fc0b15d7a72b322  0        FALSE        FALSE        FALSE
## 16 2e3414cd356e335e4f675efeb43938b3  0        FALSE        FALSE        FALSE
## 17 8fabb679415fe6a9969015076873e211 41        FALSE         TRUE         TRUE
## 18 7363aac88b5325ac49cb469e284e10dd  1        FALSE        FALSE        FALSE
## 19 2dc2cf60102ff00cd077f2304ec84d06  0        FALSE        FALSE        FALSE
## 20 cbb17c39cfea0aa843212c70619d1316  0        FALSE        FALSE        FALSE
## 21 555a3619c7746cdad6de2b1c181e791c  0        FALSE        FALSE        FALSE
## 22 c1bb8cdcab0662dec8ca827a53614d5e  0        FALSE        FALSE        FALSE
## 23 a3f9e91d15a189d0cf1e0596b9688957  1        FALSE        FALSE        FALSE
## 24 a0798db0c868e9ad6089502956d68d87  0        FALSE        FALSE        FALSE
## 25 c5859b7f5d14b8c145fbc3ad583c70e8  5        FALSE        FALSE        FALSE
## 26 41712010fd4dd22c723e5490a0184a5d  0        FALSE        FALSE        FALSE
## 27 590455138b9a5855dcb8f43167711f49  0        FALSE        FALSE        FALSE
## 28 cab39ce0fdcec18719b175b8181bd917  1        FALSE        FALSE        FALSE
## 29 4a4d11bb32e68079b4c71483bf0fcf36 35        FALSE        FALSE        FALSE
## 30 d5c6c4e5dc8f5bc4536e4fc361eca630  0        FALSE        FALSE        FALSE
## 31 e0b0817162b44d0e17a1c50486839924  0        FALSE        FALSE        FALSE
## 32 6960eba3db7d4d863d042ab497d7481a  0        FALSE        FALSE        FALSE
## 33 8512cc461fa8b253841200871faba531  0        FALSE        FALSE        FALSE
## 34 a38b6324c57e2ef3c842180166aeb60f  0        FALSE        FALSE        FALSE
## 35 172049e46605909f99651e4d0efb8578  1        FALSE        FALSE        FALSE
## 36 6e13c195d4554dd8cf4923df9decd183  0        FALSE        FALSE        FALSE
## 37 d2a695aa271adf072749a729fd96a731  1        FALSE        FALSE        FALSE
## 38 42de4e368cca3ff50954f82c12dd5315  2        FALSE        FALSE        FALSE
## 39 f966e124604e0e32b209b88df6e42cd4  0        FALSE        FALSE        FALSE
## 40 f7c08b0d8e574f7c089b95e0e4344d5e  0        FALSE        FALSE        FALSE
## 41 1dda53416f231a3345668df39d4ae780  2        FALSE        FALSE        FALSE
## 42 8c6b152535efbebf12179855cb31b8d8  0        FALSE        FALSE        FALSE
## 43 8aa67fe08404e3d574aafb6622786c0f  0        FALSE        FALSE        FALSE
## 44 ec70d97b11476164cbd828e126ad10da  0        FALSE        FALSE        FALSE
## 45 5c4ca852b40641b3eb0ad23e69bb6583  0        FALSE        FALSE        FALSE
## 46 8455a8f642c4b847f76baa31664bcfaa  0        FALSE        FALSE        FALSE
## 47 9a299c48d0e3ddea2122806528070d6e  0        FALSE        FALSE        FALSE
## 48 e4ae256bb51896c21795f743dc9ed9dd  0        FALSE        FALSE        FALSE
## 49 7a297761643e32391ea7cd03b0995e62  0        FALSE        FALSE        FALSE
## 50 73b4f14d16f02ee0ac82838166128de4  0        FALSE        FALSE        FALSE
## 51 8f5ceded1cd7e86b8271d3fab24d322a  0        FALSE        FALSE        FALSE
## 52 e00f85cb3452e37943500e86afb268f4  0        FALSE        FALSE        FALSE
##    detected_0.6
## 1         FALSE
## 2         FALSE
## 3         FALSE
## 4         FALSE
## 5         FALSE
## 6         FALSE
## 7         FALSE
## 8         FALSE
## 9         FALSE
## 10        FALSE
## 11        FALSE
## 12        FALSE
## 13        FALSE
## 14        FALSE
## 15        FALSE
## 16        FALSE
## 17         TRUE
## 18        FALSE
## 19        FALSE
## 20        FALSE
## 21        FALSE
## 22        FALSE
## 23        FALSE
## 24        FALSE
## 25        FALSE
## 26        FALSE
## 27        FALSE
## 28        FALSE
## 29         TRUE
## 30        FALSE
## 31        FALSE
## 32        FALSE
## 33        FALSE
## 34        FALSE
## 35        FALSE
## 36        FALSE
## 37        FALSE
## 38        FALSE
## 39        FALSE
## 40        FALSE
## 41        FALSE
## 42        FALSE
## 43        FALSE
## 44        FALSE
## 45        FALSE
## 46        FALSE
## 47        FALSE
## 48        FALSE
## 49        FALSE
## 50        FALSE
## 51        FALSE
## 52        FALSE
res$fig$data
##                                                           taxa_id            x
## f1d7ac8c18c1d8144a00e4d785c86e4e f1d7ac8c18c1d8144a00e4d785c86e4e -0.086218666
## 05404c9fdf9f3f334eb618bac3f434f6 05404c9fdf9f3f334eb618bac3f434f6 -0.094009689
## 06105df60508c2ed24a54f1b8ed64e49 06105df60508c2ed24a54f1b8ed64e49 -0.065438528
## bb1b75f41ff9c9db1d1de41e8388eb52 bb1b75f41ff9c9db1d1de41e8388eb52 -0.124785851
## 1efe70e365249c0d2fc28580b6ba0529 1efe70e365249c0d2fc28580b6ba0529 -0.152379131
## 098c3bbd8234f4ac198297ac0bde957d 098c3bbd8234f4ac198297ac0bde957d  0.041043102
## db8f48a3fe2fca95fb4986f5507b9076 db8f48a3fe2fca95fb4986f5507b9076 -0.277701763
## 3f5533292a655d0f54a64560d65e823b 3f5533292a655d0f54a64560d65e823b -0.182837956
## 7017e1746ae742d69c50c1274cc2f02e 7017e1746ae742d69c50c1274cc2f02e -0.006409051
## 803eb52cfe3d77bdf9fe14c011e425fb 803eb52cfe3d77bdf9fe14c011e425fb  0.032416779
## 5be4e26b904b9d711eb829785568ff92 5be4e26b904b9d711eb829785568ff92  0.023829617
## a41766e0533ab64a9966f362a7938478 a41766e0533ab64a9966f362a7938478 -0.235471065
## 4b11c7e76e289279c460920e677924c4 4b11c7e76e289279c460920e677924c4  0.012949460
## 5f0cbc930515c7ec606289b79d8c4ba3 5f0cbc930515c7ec606289b79d8c4ba3 -0.005617348
## 30bb271b2baa58b49fc0b15d7a72b322 30bb271b2baa58b49fc0b15d7a72b322 -0.034044178
## 2e3414cd356e335e4f675efeb43938b3 2e3414cd356e335e4f675efeb43938b3 -0.018012080
## 8fabb679415fe6a9969015076873e211 8fabb679415fe6a9969015076873e211  0.576786773
## 7363aac88b5325ac49cb469e284e10dd 7363aac88b5325ac49cb469e284e10dd -0.342729973
## 2dc2cf60102ff00cd077f2304ec84d06 2dc2cf60102ff00cd077f2304ec84d06 -0.128825045
## cbb17c39cfea0aa843212c70619d1316 cbb17c39cfea0aa843212c70619d1316  0.044158797
## 555a3619c7746cdad6de2b1c181e791c 555a3619c7746cdad6de2b1c181e791c  0.380995407
## c1bb8cdcab0662dec8ca827a53614d5e c1bb8cdcab0662dec8ca827a53614d5e -0.027532888
## a3f9e91d15a189d0cf1e0596b9688957 a3f9e91d15a189d0cf1e0596b9688957  0.101324865
## a0798db0c868e9ad6089502956d68d87 a0798db0c868e9ad6089502956d68d87 -0.046179903
## c5859b7f5d14b8c145fbc3ad583c70e8 c5859b7f5d14b8c145fbc3ad583c70e8  0.404147740
## 41712010fd4dd22c723e5490a0184a5d 41712010fd4dd22c723e5490a0184a5d -0.021744729
## 590455138b9a5855dcb8f43167711f49 590455138b9a5855dcb8f43167711f49  0.043070597
## cab39ce0fdcec18719b175b8181bd917 cab39ce0fdcec18719b175b8181bd917 -0.119759676
## 4a4d11bb32e68079b4c71483bf0fcf36 4a4d11bb32e68079b4c71483bf0fcf36 -0.103967532
## d5c6c4e5dc8f5bc4536e4fc361eca630 d5c6c4e5dc8f5bc4536e4fc361eca630 -0.005942238
## e0b0817162b44d0e17a1c50486839924 e0b0817162b44d0e17a1c50486839924  0.045933874
## 6960eba3db7d4d863d042ab497d7481a 6960eba3db7d4d863d042ab497d7481a  0.036407589
## 8512cc461fa8b253841200871faba531 8512cc461fa8b253841200871faba531  0.150038728
## a38b6324c57e2ef3c842180166aeb60f a38b6324c57e2ef3c842180166aeb60f -0.018358062
## 172049e46605909f99651e4d0efb8578 172049e46605909f99651e4d0efb8578  0.108447214
## 6e13c195d4554dd8cf4923df9decd183 6e13c195d4554dd8cf4923df9decd183 -0.037038892
## d2a695aa271adf072749a729fd96a731 d2a695aa271adf072749a729fd96a731  0.166497178
## 42de4e368cca3ff50954f82c12dd5315 42de4e368cca3ff50954f82c12dd5315  0.344668553
## f966e124604e0e32b209b88df6e42cd4 f966e124604e0e32b209b88df6e42cd4 -0.030616172
## f7c08b0d8e574f7c089b95e0e4344d5e f7c08b0d8e574f7c089b95e0e4344d5e  0.040385990
## 1dda53416f231a3345668df39d4ae780 1dda53416f231a3345668df39d4ae780  0.102392419
## 8c6b152535efbebf12179855cb31b8d8 8c6b152535efbebf12179855cb31b8d8 -0.018020190
## 8aa67fe08404e3d574aafb6622786c0f 8aa67fe08404e3d574aafb6622786c0f -0.138264090
## ec70d97b11476164cbd828e126ad10da ec70d97b11476164cbd828e126ad10da -0.034953157
## 5c4ca852b40641b3eb0ad23e69bb6583 5c4ca852b40641b3eb0ad23e69bb6583 -0.144109533
## 8455a8f642c4b847f76baa31664bcfaa 8455a8f642c4b847f76baa31664bcfaa -0.037864621
## 9a299c48d0e3ddea2122806528070d6e 9a299c48d0e3ddea2122806528070d6e  0.018634102
## e4ae256bb51896c21795f743dc9ed9dd e4ae256bb51896c21795f743dc9ed9dd -0.115065267
## 7a297761643e32391ea7cd03b0995e62 7a297761643e32391ea7cd03b0995e62  0.099954054
## 73b4f14d16f02ee0ac82838166128de4 73b4f14d16f02ee0ac82838166128de4 -0.036197817
## 8f5ceded1cd7e86b8271d3fab24d322a 8f5ceded1cd7e86b8271d3fab24d322a -0.058082128
## e00f85cb3452e37943500e86afb268f4 e00f85cb3452e37943500e86afb268f4 -0.025905621
##                                   y zero_ind
## f1d7ac8c18c1d8144a00e4d785c86e4e  1       No
## 05404c9fdf9f3f334eb618bac3f434f6  0       No
## 06105df60508c2ed24a54f1b8ed64e49  0       No
## bb1b75f41ff9c9db1d1de41e8388eb52  0       No
## 1efe70e365249c0d2fc28580b6ba0529  0       No
## 098c3bbd8234f4ac198297ac0bde957d  0       No
## db8f48a3fe2fca95fb4986f5507b9076  0       No
## 3f5533292a655d0f54a64560d65e823b  0       No
## 7017e1746ae742d69c50c1274cc2f02e  0       No
## 803eb52cfe3d77bdf9fe14c011e425fb  0       No
## 5be4e26b904b9d711eb829785568ff92  0       No
## a41766e0533ab64a9966f362a7938478  0       No
## 4b11c7e76e289279c460920e677924c4  0       No
## 5f0cbc930515c7ec606289b79d8c4ba3  0       No
## 30bb271b2baa58b49fc0b15d7a72b322  0       No
## 2e3414cd356e335e4f675efeb43938b3  0       No
## 8fabb679415fe6a9969015076873e211 41       No
## 7363aac88b5325ac49cb469e284e10dd  1       No
## 2dc2cf60102ff00cd077f2304ec84d06  0       No
## cbb17c39cfea0aa843212c70619d1316  0       No
## 555a3619c7746cdad6de2b1c181e791c  0       No
## c1bb8cdcab0662dec8ca827a53614d5e  0       No
## a3f9e91d15a189d0cf1e0596b9688957  1       No
## a0798db0c868e9ad6089502956d68d87  0       No
## c5859b7f5d14b8c145fbc3ad583c70e8  5       No
## 41712010fd4dd22c723e5490a0184a5d  0       No
## 590455138b9a5855dcb8f43167711f49  0       No
## cab39ce0fdcec18719b175b8181bd917  1       No
## 4a4d11bb32e68079b4c71483bf0fcf36 35       No
## d5c6c4e5dc8f5bc4536e4fc361eca630  0       No
## e0b0817162b44d0e17a1c50486839924  0       No
## 6960eba3db7d4d863d042ab497d7481a  0       No
## 8512cc461fa8b253841200871faba531  0       No
## a38b6324c57e2ef3c842180166aeb60f  0       No
## 172049e46605909f99651e4d0efb8578  1       No
## 6e13c195d4554dd8cf4923df9decd183  0       No
## d2a695aa271adf072749a729fd96a731  1       No
## 42de4e368cca3ff50954f82c12dd5315  2       No
## f966e124604e0e32b209b88df6e42cd4  0       No
## f7c08b0d8e574f7c089b95e0e4344d5e  0       No
## 1dda53416f231a3345668df39d4ae780  2       No
## 8c6b152535efbebf12179855cb31b8d8  0       No
## 8aa67fe08404e3d574aafb6622786c0f  0       No
## ec70d97b11476164cbd828e126ad10da  0       No
## 5c4ca852b40641b3eb0ad23e69bb6583  0       No
## 8455a8f642c4b847f76baa31664bcfaa  0       No
## 9a299c48d0e3ddea2122806528070d6e  0       No
## e4ae256bb51896c21795f743dc9ed9dd  0       No
## 7a297761643e32391ea7cd03b0995e62  0       No
## 73b4f14d16f02ee0ac82838166128de4  0       No
## 8f5ceded1cd7e86b8271d3fab24d322a  0       No
## e00f85cb3452e37943500e86afb268f4  0       No
# Step 3: Volcano Plot
# Number of taxa except structural zeros
n_taxa = ifelse(is.null(struc_zero), nrow(feature_table), sum(apply(struc_zero, 1, sum) == 0))
# Cutoff values for declaring differentially abundant taxa
cut_off = c(0.9 * (n_taxa -1), 0.8 * (n_taxa -1), 0.7 * (n_taxa -1), 0.6 * (n_taxa -1))
names(cut_off) = c("detected_0.9", "detected_0.8", "detected_0.7", "detected_0.6")

# Annotation data
dat_ann = data.frame(x = min(res$fig$data$x), y = cut_off["detected_0.7"], label = "W[0.7]")

fig = res$fig +  
  geom_hline(yintercept = cut_off["detected_0.7"], linetype = "dashed") + 
  geom_text(data = dat_ann, aes(x = x, y = y, label = label), 
            size = 4, vjust = -0.5, hjust = 0, color = "orange", parse = TRUE)
fig  

# ORGANIZE PLOT

TaxaData2 <- readRDS("../../data/processed/microbiome/phyloseq_inputs/merged-taxonomy-dada2-retrained.rds")
rownames(TaxaData2) <- TaxaData2$Feature.ID
## Warning: Setting row names on a tibble is deprecated.
ancomRes <- res$out

test <- merge(ancomRes, TaxaData2, by.x = "taxa_id", by.y = "Feature.ID", all.x = TRUE, all.y = FALSE)

# Make plot 
tempdata <- test[test$detected_0.6 == TRUE,] # 0.6 but can make 0.7
asv_id <- unique(tempdata$taxa_id)

MicroData4 <- feature_table[row.names(feature_table) %in% asv_id,]

# Rename to the taxonomy 
#MicroData5 <- as.data.frame(as.matrix(t(MicroData4)))
TaxaData3 <- TaxaData2 %>% dplyr::select(Order, Family, Genus)
# Make a unique naming column that incorporates multiple levels of taxonomy
TaxaData3$OrderFamilyGenus <- paste(TaxaData3$Order, TaxaData3$Family, TaxaData3$Genus)
rownames(TaxaData3) <- rownames(TaxaData2)
## Warning: Setting row names on a tibble is deprecated.
# Merge
MicroData5 <- merge(MicroData4, TaxaData3,
                    by=0, all.x = T, all.y = F)

row.names(MicroData5) <- MicroData5$OrderFamilyGenus

MicroData5 <- MicroData5 %>%  dplyr::select(-Order, -Family, -Genus, -OrderFamilyGenus, -Row.names)
MicroData5 <- as.data.frame(as.matrix(t(MicroData5)))

# Merge with the phenotype data 
MetaData3 <- MetaData2 %>% dplyr::select(tmao_mdn)

ggData <- merge(MetaData3, MicroData5, by =0)
row.names(ggData) <- ggData$Row.names
ggData <- ggData %>% dplyr::select(-Row.names)

# Gather the data
library(tidyr)
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:S4Vectors':
## 
##     expand
ggData2 <- gather(ggData,
                  key = "Taxa",
                  value = "Abundance",
                  -tmao_mdn)
                
ggData2$Abundance <- log(ggData2$Abundance)

# Plot it
p <- ggplot(data = ggData2,
            aes(x = Taxa, 
                y = Abundance,
                fill = tmao_mdn), 
            alpha = 0.1) +
  geom_boxplot(lwd=.25) +
  xlab(NULL) +
  ylab("Log of Taxa Abundance") +
  theme_bw() + # note order matters here - need bw to be before theme()
  theme(axis.text.y = element_text( hjust = 1),
        axis.text.x = element_text(angle = 60, hjust = 1),
        text=element_text(size=11))  +
  scale_fill_manual(values=c("#2D708EFF", "#B8DE29FF")) + 
  #scale_fill_viridis(discrete = TRUE) +
  #coord_flip() +
  labs(fill = "TMAO \n Classification")

p
## Warning: Removed 341 rows containing non-finite values (stat_boxplot).

ANCOM2 covariate analysis

ANCOM2 lets you include covariates into the analysis. If we add the same covariates as in the DESeq2 analysis, do we get similar results?

source("../../Rscripts/ancom_v2.1.R")

# OTU table
MicroData <- otu_table(PSOtmao1G)
MicroData <- as.data.frame(as.matrix(MicroData), stringsAsFactors = F)
# Add 1
#MicroData2 <- MicroData +1
#MicroData2 <- as.data.frame(as.matrix(t(MicroData2),stringsAsFactors = F))
#dim(MicroData2) # 103 x 98

# Load the phenotype data
#------------------------
# Perform ANCOM analysis
# Step 1: Data preprocessing
# OTU data
otu_data <- otu_table(PSOtmao1G)
otu_data <- as.data.frame(as.matrix(otu_data), stringsAsFactors = F)
otu_id <- rownames(otu_data)

# Metadata 
#df <- readRDS("../data_processed/mapping/Biocrates_mapping_BD1_210809.rds")
#df$p.Cresol.SO4_bd1_mdn <- plyr::revalue(df$p.Cresol.SO4_bd1_mdn, c("0" = "< Median", "1" = "> Median"))
# Subset
MetaData2 <- df2 %>% dplyr::select(c(tmao_mdn, age, sex, bmi_final))
MetaData2$Sample.ID <- rownames(MetaData2)


feature_table = otu_data; sample_var = "Sample.ID"; group_var = NULL
out_cut = 0.05; zero_cut = 0.95; lib_cut = 1000; neg_lb = FALSE # make zero_cut=0.95 to match criteria used for other metabolites
# Function
prepro = feature_table_pre_process(feature_table = feature_table,
                                   meta_data = MetaData2,
                                   sample_var = sample_var,
                                   group_var = NULL,
                                   out_cut = out_cut,
                                   zero_cut = zero_cut,
                                   lib_cut = lib_cut,
                                   neg_lb = neg_lb)

# Step 2: ANCOM
feature_table = prepro$feature_table 
meta_data = prepro$meta_data
struc_zero = prepro$structure_zeros

# Define variables for ANCOM
main_var = "tmao_mdn"; p_adj_method = "BH"; alpha = 0.05
adj_formula = c("age + sex + bmi_final"); rand_formula = NULL
# Run ANCCOM
t_start = Sys.time()
res = ANCOM(feature_table = feature_table, 
            meta_data = meta_data, 
            struc_zero = struc_zero, 
            main_var = main_var, 
            p_adj_method = p_adj_method, 
            alpha = alpha, 
            adj_formula = adj_formula, 
            rand_formula = rand_formula)
t_end = Sys.time()
t_run = t_end - t_start 
t_run # around 5s
## Time difference of 3.597206 secs
# Results
res$out
##                             taxa_id  W detected_0.9 detected_0.8 detected_0.7
## 1  f1d7ac8c18c1d8144a00e4d785c86e4e  1        FALSE        FALSE        FALSE
## 2  05404c9fdf9f3f334eb618bac3f434f6  0        FALSE        FALSE        FALSE
## 3  06105df60508c2ed24a54f1b8ed64e49  0        FALSE        FALSE        FALSE
## 4  bb1b75f41ff9c9db1d1de41e8388eb52  0        FALSE        FALSE        FALSE
## 5  1efe70e365249c0d2fc28580b6ba0529  0        FALSE        FALSE        FALSE
## 6  098c3bbd8234f4ac198297ac0bde957d  0        FALSE        FALSE        FALSE
## 7  db8f48a3fe2fca95fb4986f5507b9076  0        FALSE        FALSE        FALSE
## 8  3f5533292a655d0f54a64560d65e823b  0        FALSE        FALSE        FALSE
## 9  7017e1746ae742d69c50c1274cc2f02e  0        FALSE        FALSE        FALSE
## 10 803eb52cfe3d77bdf9fe14c011e425fb  0        FALSE        FALSE        FALSE
## 11 5be4e26b904b9d711eb829785568ff92  0        FALSE        FALSE        FALSE
## 12 a41766e0533ab64a9966f362a7938478  0        FALSE        FALSE        FALSE
## 13 4b11c7e76e289279c460920e677924c4  0        FALSE        FALSE        FALSE
## 14 5f0cbc930515c7ec606289b79d8c4ba3  0        FALSE        FALSE        FALSE
## 15 30bb271b2baa58b49fc0b15d7a72b322  0        FALSE        FALSE        FALSE
## 16 2e3414cd356e335e4f675efeb43938b3  0        FALSE        FALSE        FALSE
## 17 8fabb679415fe6a9969015076873e211 31        FALSE        FALSE        FALSE
## 18 7363aac88b5325ac49cb469e284e10dd  1        FALSE        FALSE        FALSE
## 19 2dc2cf60102ff00cd077f2304ec84d06  0        FALSE        FALSE        FALSE
## 20 cbb17c39cfea0aa843212c70619d1316  0        FALSE        FALSE        FALSE
## 21 555a3619c7746cdad6de2b1c181e791c  0        FALSE        FALSE        FALSE
## 22 c1bb8cdcab0662dec8ca827a53614d5e  0        FALSE        FALSE        FALSE
## 23 a3f9e91d15a189d0cf1e0596b9688957  0        FALSE        FALSE        FALSE
## 24 a0798db0c868e9ad6089502956d68d87  0        FALSE        FALSE        FALSE
## 25 c5859b7f5d14b8c145fbc3ad583c70e8  1        FALSE        FALSE        FALSE
## 26 41712010fd4dd22c723e5490a0184a5d  1        FALSE        FALSE        FALSE
## 27 590455138b9a5855dcb8f43167711f49  0        FALSE        FALSE        FALSE
## 28 cab39ce0fdcec18719b175b8181bd917  0        FALSE        FALSE        FALSE
## 29 4a4d11bb32e68079b4c71483bf0fcf36 29        FALSE        FALSE        FALSE
## 30 d5c6c4e5dc8f5bc4536e4fc361eca630  0        FALSE        FALSE        FALSE
## 31 e0b0817162b44d0e17a1c50486839924  0        FALSE        FALSE        FALSE
## 32 6960eba3db7d4d863d042ab497d7481a  0        FALSE        FALSE        FALSE
## 33 8512cc461fa8b253841200871faba531  0        FALSE        FALSE        FALSE
## 34 a38b6324c57e2ef3c842180166aeb60f  0        FALSE        FALSE        FALSE
## 35 172049e46605909f99651e4d0efb8578  1        FALSE        FALSE        FALSE
## 36 6e13c195d4554dd8cf4923df9decd183  0        FALSE        FALSE        FALSE
## 37 d2a695aa271adf072749a729fd96a731  1        FALSE        FALSE        FALSE
## 38 42de4e368cca3ff50954f82c12dd5315  1        FALSE        FALSE        FALSE
## 39 f966e124604e0e32b209b88df6e42cd4  0        FALSE        FALSE        FALSE
## 40 f7c08b0d8e574f7c089b95e0e4344d5e  0        FALSE        FALSE        FALSE
## 41 1dda53416f231a3345668df39d4ae780  0        FALSE        FALSE        FALSE
## 42 8c6b152535efbebf12179855cb31b8d8  0        FALSE        FALSE        FALSE
## 43 8aa67fe08404e3d574aafb6622786c0f  0        FALSE        FALSE        FALSE
## 44 ec70d97b11476164cbd828e126ad10da  0        FALSE        FALSE        FALSE
## 45 5c4ca852b40641b3eb0ad23e69bb6583  0        FALSE        FALSE        FALSE
## 46 8455a8f642c4b847f76baa31664bcfaa  0        FALSE        FALSE        FALSE
## 47 9a299c48d0e3ddea2122806528070d6e  0        FALSE        FALSE        FALSE
## 48 e4ae256bb51896c21795f743dc9ed9dd  0        FALSE        FALSE        FALSE
## 49 7a297761643e32391ea7cd03b0995e62  1        FALSE        FALSE        FALSE
## 50 73b4f14d16f02ee0ac82838166128de4  0        FALSE        FALSE        FALSE
## 51 8f5ceded1cd7e86b8271d3fab24d322a  0        FALSE        FALSE        FALSE
## 52 e00f85cb3452e37943500e86afb268f4  0        FALSE        FALSE        FALSE
##    detected_0.6
## 1         FALSE
## 2         FALSE
## 3         FALSE
## 4         FALSE
## 5         FALSE
## 6         FALSE
## 7         FALSE
## 8         FALSE
## 9         FALSE
## 10        FALSE
## 11        FALSE
## 12        FALSE
## 13        FALSE
## 14        FALSE
## 15        FALSE
## 16        FALSE
## 17         TRUE
## 18        FALSE
## 19        FALSE
## 20        FALSE
## 21        FALSE
## 22        FALSE
## 23        FALSE
## 24        FALSE
## 25        FALSE
## 26        FALSE
## 27        FALSE
## 28        FALSE
## 29        FALSE
## 30        FALSE
## 31        FALSE
## 32        FALSE
## 33        FALSE
## 34        FALSE
## 35        FALSE
## 36        FALSE
## 37        FALSE
## 38        FALSE
## 39        FALSE
## 40        FALSE
## 41        FALSE
## 42        FALSE
## 43        FALSE
## 44        FALSE
## 45        FALSE
## 46        FALSE
## 47        FALSE
## 48        FALSE
## 49        FALSE
## 50        FALSE
## 51        FALSE
## 52        FALSE
res$fig$data
##                                                           taxa_id            x
## f1d7ac8c18c1d8144a00e4d785c86e4e f1d7ac8c18c1d8144a00e4d785c86e4e -0.086218666
## 05404c9fdf9f3f334eb618bac3f434f6 05404c9fdf9f3f334eb618bac3f434f6 -0.094009689
## 06105df60508c2ed24a54f1b8ed64e49 06105df60508c2ed24a54f1b8ed64e49 -0.065438528
## bb1b75f41ff9c9db1d1de41e8388eb52 bb1b75f41ff9c9db1d1de41e8388eb52 -0.124785851
## 1efe70e365249c0d2fc28580b6ba0529 1efe70e365249c0d2fc28580b6ba0529 -0.152379131
## 098c3bbd8234f4ac198297ac0bde957d 098c3bbd8234f4ac198297ac0bde957d  0.041043102
## db8f48a3fe2fca95fb4986f5507b9076 db8f48a3fe2fca95fb4986f5507b9076 -0.277701763
## 3f5533292a655d0f54a64560d65e823b 3f5533292a655d0f54a64560d65e823b -0.182837956
## 7017e1746ae742d69c50c1274cc2f02e 7017e1746ae742d69c50c1274cc2f02e -0.006409051
## 803eb52cfe3d77bdf9fe14c011e425fb 803eb52cfe3d77bdf9fe14c011e425fb  0.032416779
## 5be4e26b904b9d711eb829785568ff92 5be4e26b904b9d711eb829785568ff92  0.023829617
## a41766e0533ab64a9966f362a7938478 a41766e0533ab64a9966f362a7938478 -0.235471065
## 4b11c7e76e289279c460920e677924c4 4b11c7e76e289279c460920e677924c4  0.012949460
## 5f0cbc930515c7ec606289b79d8c4ba3 5f0cbc930515c7ec606289b79d8c4ba3 -0.005617348
## 30bb271b2baa58b49fc0b15d7a72b322 30bb271b2baa58b49fc0b15d7a72b322 -0.034044178
## 2e3414cd356e335e4f675efeb43938b3 2e3414cd356e335e4f675efeb43938b3 -0.018012080
## 8fabb679415fe6a9969015076873e211 8fabb679415fe6a9969015076873e211  0.576786773
## 7363aac88b5325ac49cb469e284e10dd 7363aac88b5325ac49cb469e284e10dd -0.342729973
## 2dc2cf60102ff00cd077f2304ec84d06 2dc2cf60102ff00cd077f2304ec84d06 -0.128825045
## cbb17c39cfea0aa843212c70619d1316 cbb17c39cfea0aa843212c70619d1316  0.044158797
## 555a3619c7746cdad6de2b1c181e791c 555a3619c7746cdad6de2b1c181e791c  0.380995407
## c1bb8cdcab0662dec8ca827a53614d5e c1bb8cdcab0662dec8ca827a53614d5e -0.027532888
## a3f9e91d15a189d0cf1e0596b9688957 a3f9e91d15a189d0cf1e0596b9688957  0.101324865
## a0798db0c868e9ad6089502956d68d87 a0798db0c868e9ad6089502956d68d87 -0.046179903
## c5859b7f5d14b8c145fbc3ad583c70e8 c5859b7f5d14b8c145fbc3ad583c70e8  0.404147740
## 41712010fd4dd22c723e5490a0184a5d 41712010fd4dd22c723e5490a0184a5d -0.021744729
## 590455138b9a5855dcb8f43167711f49 590455138b9a5855dcb8f43167711f49  0.043070597
## cab39ce0fdcec18719b175b8181bd917 cab39ce0fdcec18719b175b8181bd917 -0.119759676
## 4a4d11bb32e68079b4c71483bf0fcf36 4a4d11bb32e68079b4c71483bf0fcf36 -0.103967532
## d5c6c4e5dc8f5bc4536e4fc361eca630 d5c6c4e5dc8f5bc4536e4fc361eca630 -0.005942238
## e0b0817162b44d0e17a1c50486839924 e0b0817162b44d0e17a1c50486839924  0.045933874
## 6960eba3db7d4d863d042ab497d7481a 6960eba3db7d4d863d042ab497d7481a  0.036407589
## 8512cc461fa8b253841200871faba531 8512cc461fa8b253841200871faba531  0.150038728
## a38b6324c57e2ef3c842180166aeb60f a38b6324c57e2ef3c842180166aeb60f -0.018358062
## 172049e46605909f99651e4d0efb8578 172049e46605909f99651e4d0efb8578  0.108447214
## 6e13c195d4554dd8cf4923df9decd183 6e13c195d4554dd8cf4923df9decd183 -0.037038892
## d2a695aa271adf072749a729fd96a731 d2a695aa271adf072749a729fd96a731  0.166497178
## 42de4e368cca3ff50954f82c12dd5315 42de4e368cca3ff50954f82c12dd5315  0.344668553
## f966e124604e0e32b209b88df6e42cd4 f966e124604e0e32b209b88df6e42cd4 -0.030616172
## f7c08b0d8e574f7c089b95e0e4344d5e f7c08b0d8e574f7c089b95e0e4344d5e  0.040385990
## 1dda53416f231a3345668df39d4ae780 1dda53416f231a3345668df39d4ae780  0.102392419
## 8c6b152535efbebf12179855cb31b8d8 8c6b152535efbebf12179855cb31b8d8 -0.018020190
## 8aa67fe08404e3d574aafb6622786c0f 8aa67fe08404e3d574aafb6622786c0f -0.138264090
## ec70d97b11476164cbd828e126ad10da ec70d97b11476164cbd828e126ad10da -0.034953157
## 5c4ca852b40641b3eb0ad23e69bb6583 5c4ca852b40641b3eb0ad23e69bb6583 -0.144109533
## 8455a8f642c4b847f76baa31664bcfaa 8455a8f642c4b847f76baa31664bcfaa -0.037864621
## 9a299c48d0e3ddea2122806528070d6e 9a299c48d0e3ddea2122806528070d6e  0.018634102
## e4ae256bb51896c21795f743dc9ed9dd e4ae256bb51896c21795f743dc9ed9dd -0.115065267
## 7a297761643e32391ea7cd03b0995e62 7a297761643e32391ea7cd03b0995e62  0.099954054
## 73b4f14d16f02ee0ac82838166128de4 73b4f14d16f02ee0ac82838166128de4 -0.036197817
## 8f5ceded1cd7e86b8271d3fab24d322a 8f5ceded1cd7e86b8271d3fab24d322a -0.058082128
## e00f85cb3452e37943500e86afb268f4 e00f85cb3452e37943500e86afb268f4 -0.025905621
##                                   y zero_ind
## f1d7ac8c18c1d8144a00e4d785c86e4e  1       No
## 05404c9fdf9f3f334eb618bac3f434f6  0       No
## 06105df60508c2ed24a54f1b8ed64e49  0       No
## bb1b75f41ff9c9db1d1de41e8388eb52  0       No
## 1efe70e365249c0d2fc28580b6ba0529  0       No
## 098c3bbd8234f4ac198297ac0bde957d  0       No
## db8f48a3fe2fca95fb4986f5507b9076  0       No
## 3f5533292a655d0f54a64560d65e823b  0       No
## 7017e1746ae742d69c50c1274cc2f02e  0       No
## 803eb52cfe3d77bdf9fe14c011e425fb  0       No
## 5be4e26b904b9d711eb829785568ff92  0       No
## a41766e0533ab64a9966f362a7938478  0       No
## 4b11c7e76e289279c460920e677924c4  0       No
## 5f0cbc930515c7ec606289b79d8c4ba3  0       No
## 30bb271b2baa58b49fc0b15d7a72b322  0       No
## 2e3414cd356e335e4f675efeb43938b3  0       No
## 8fabb679415fe6a9969015076873e211 31       No
## 7363aac88b5325ac49cb469e284e10dd  1       No
## 2dc2cf60102ff00cd077f2304ec84d06  0       No
## cbb17c39cfea0aa843212c70619d1316  0       No
## 555a3619c7746cdad6de2b1c181e791c  0       No
## c1bb8cdcab0662dec8ca827a53614d5e  0       No
## a3f9e91d15a189d0cf1e0596b9688957  0       No
## a0798db0c868e9ad6089502956d68d87  0       No
## c5859b7f5d14b8c145fbc3ad583c70e8  1       No
## 41712010fd4dd22c723e5490a0184a5d  1       No
## 590455138b9a5855dcb8f43167711f49  0       No
## cab39ce0fdcec18719b175b8181bd917  0       No
## 4a4d11bb32e68079b4c71483bf0fcf36 29       No
## d5c6c4e5dc8f5bc4536e4fc361eca630  0       No
## e0b0817162b44d0e17a1c50486839924  0       No
## 6960eba3db7d4d863d042ab497d7481a  0       No
## 8512cc461fa8b253841200871faba531  0       No
## a38b6324c57e2ef3c842180166aeb60f  0       No
## 172049e46605909f99651e4d0efb8578  1       No
## 6e13c195d4554dd8cf4923df9decd183  0       No
## d2a695aa271adf072749a729fd96a731  1       No
## 42de4e368cca3ff50954f82c12dd5315  1       No
## f966e124604e0e32b209b88df6e42cd4  0       No
## f7c08b0d8e574f7c089b95e0e4344d5e  0       No
## 1dda53416f231a3345668df39d4ae780  0       No
## 8c6b152535efbebf12179855cb31b8d8  0       No
## 8aa67fe08404e3d574aafb6622786c0f  0       No
## ec70d97b11476164cbd828e126ad10da  0       No
## 5c4ca852b40641b3eb0ad23e69bb6583  0       No
## 8455a8f642c4b847f76baa31664bcfaa  0       No
## 9a299c48d0e3ddea2122806528070d6e  0       No
## e4ae256bb51896c21795f743dc9ed9dd  0       No
## 7a297761643e32391ea7cd03b0995e62  1       No
## 73b4f14d16f02ee0ac82838166128de4  0       No
## 8f5ceded1cd7e86b8271d3fab24d322a  0       No
## e00f85cb3452e37943500e86afb268f4  0       No
# Step 3: Volcano Plot
# Number of taxa except structural zeros
n_taxa = ifelse(is.null(struc_zero), nrow(feature_table), sum(apply(struc_zero, 1, sum) == 0))
# Cutoff values for declaring differentially abundant taxa
cut_off = c(0.9 * (n_taxa -1), 0.8 * (n_taxa -1), 0.7 * (n_taxa -1), 0.6 * (n_taxa -1))
names(cut_off) = c("detected_0.9", "detected_0.8", "detected_0.7", "detected_0.6")

# Annotation data
dat_ann = data.frame(x = min(res$fig$data$x), y = cut_off["detected_0.7"], label = "W[0.7]")

fig = res$fig +  
  geom_hline(yintercept = cut_off["detected_0.7"], linetype = "dashed") + 
  geom_text(data = dat_ann, aes(x = x, y = y, label = label), 
            size = 4, vjust = -0.5, hjust = 0, color = "orange", parse = TRUE)
fig  

# ORGANIZE PLOT

TaxaData2 <- readRDS("../../data/processed/microbiome/phyloseq_inputs/merged-taxonomy-dada2-retrained.rds")
rownames(TaxaData2) <- TaxaData2$Feature.ID
## Warning: Setting row names on a tibble is deprecated.
ancomRes <- res$out

test <- merge(ancomRes, TaxaData2, by.x = "taxa_id", by.y = "Feature.ID", all.x = TRUE, all.y = FALSE)

# Make plot 
tempdata <- test[test$detected_0.6 == TRUE,] # 0.6 but can make 0.7
asv_id <- unique(tempdata$taxa_id)

MicroData4 <- feature_table[row.names(feature_table) %in% asv_id,]

# Rename to the taxonomy 
#MicroData5 <- as.data.frame(as.matrix(t(MicroData4)))
TaxaData3 <- TaxaData2 %>% dplyr::select(Order, Family, Genus)
# Make a unique naming column that incorporates multiple levels of taxonomy
TaxaData3$OrderFamilyGenus <- paste(TaxaData3$Order, TaxaData3$Family, TaxaData3$Genus)
rownames(TaxaData3) <- rownames(TaxaData2)
## Warning: Setting row names on a tibble is deprecated.
# Merge
MicroData5 <- merge(MicroData4, TaxaData3,
                    by=0, all.x = T, all.y = F)

row.names(MicroData5) <- MicroData5$OrderFamilyGenus

MicroData5 <- MicroData5 %>%  dplyr::select(-Order, -Family, -Genus, -OrderFamilyGenus, -Row.names)
MicroData5 <- as.data.frame(as.matrix(t(MicroData5)))

# Merge with the phenotype data 
MetaData3 <- MetaData2 %>% dplyr::select(tmao_mdn)

ggData <- merge(MetaData3, MicroData5, by =0)
row.names(ggData) <- ggData$Row.names
ggData <- ggData %>% dplyr::select(-Row.names)

# Gather the data
library(tidyr)
ggData2 <- gather(ggData,
                  key = "Taxa",
                  value = "Abundance",
                  -tmao_mdn)
                
ggData2$Abundance <- log(ggData2$Abundance)

# Plot it
p <- ggplot(data = ggData2,
            aes(x = Taxa, 
                y = Abundance,
                fill = tmao_mdn), 
            alpha = 0.1) +
  geom_boxplot(lwd=.25) +
  xlab(NULL) +
  ylab("Log of Taxa Abundance") +
  theme_bw() + # note order matters here - need bw to be before theme()
  theme(axis.text.y = element_text( hjust = 1),
        axis.text.x = element_text(angle = 60, hjust = 1),
        text=element_text(size=11))  +
  scale_fill_manual(values=c("#2D708EFF", "#B8DE29FF")) + 
  #scale_fill_viridis(discrete = TRUE) +
  #coord_flip() +
  labs(fill = "TMAO \n Classification")

p
## Warning: Removed 158 rows containing non-finite values (stat_boxplot).

# Stats for paper

Take a TMAO centric view point. So for the following questions, consider the whole group and the TMAO quantiles/tertiles.

# Get sample data groupings
sampleData <- as.data.frame(sample_data(PSOtmao1))
sampleData <- subset(sampleData, select = c(tmao_mdn, tmao_quantile, tmao_tertile, tmao))

# Make lists of participants in each group
tertile1 <- rownames(sampleData[sampleData$tmao_tertile == "Tertile1",])
tertile2 <- rownames(sampleData[sampleData$tmao_tertile == "Tertile2",])
tertile3 <- rownames(sampleData[sampleData$tmao_tertile == "Tertile3",])

# Get OTU table
otuTbl_PSO1 <- as.data.frame(otu_table(PSOtmao1))
otuTbl_PSO1 <- as.data.frame(t(otuTbl_PSO1)) # subjects to rows

# Get tax table
taxaData <- as.data.frame(tax_table(PSOtmao1))
taxaData$PCOFG <- paste0(taxaData$Phylum, taxaData$Class, taxaData$Order, taxaData$Family, taxaData$Genus)

How many OTUs (not glommed) in analysis?

#~~~~~~~~~~
# Tertile 1
#~~~~~~~~~~
# What is the average number of OTUs per participant in tertile 1?
taxa_occurrences_T1 <- sapply(otuTbl_PSO1[row.names(otuTbl_PSO1) %in% tertile1,], function(x) sum(x>0))
summary(taxa_occurrences_T1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    9.00   17.00   28.47   41.00  117.00
otuTbl_PSO1_smpCol <- as.data.frame(t(otuTbl_PSO1))
ppl_occurrences_T1 <- sapply(otuTbl_PSO1_smpCol[colnames(otuTbl_PSO1_smpCol) %in% tertile1,], function(x) sum(x>0))
summary(ppl_occurrences_T1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   13.00   37.00   44.00   44.34   51.00   72.00
#~~~~~~~~~~
# Tertile 2
#~~~~~~~~~~
# What is the average number of OTUs per participant in tertile 1?
taxa_occurrences_T2 <- sapply(otuTbl_PSO1[row.names(otuTbl_PSO1) %in% tertile2,], function(x) sum(x>0))
summary(taxa_occurrences_T2)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00   11.00   20.00   30.85   44.00  118.00
otuTbl_PSO1_smpCol <- as.data.frame(t(otuTbl_PSO1))
ppl_occurrences_T2 <- sapply(otuTbl_PSO1_smpCol[colnames(otuTbl_PSO1_smpCol) %in% tertile2,], function(x) sum(x>0))
summary(ppl_occurrences_T2)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.00   39.00   47.00   45.66   53.00   69.00
#~~~~~~~~~~
# Tertile 3
#~~~~~~~~~~
# What is the average number of OTUs per participant in tertile 1?
taxa_occurrences_T3 <- sapply(otuTbl_PSO1[row.names(otuTbl_PSO1) %in% tertile3,], function(x) sum(x>0))
summary(taxa_occurrences_T3)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.00   10.00   20.00   31.46   44.25  117.00
otuTbl_PSO1_smpCol <- as.data.frame(t(otuTbl_PSO1))
ppl_occurrences_T3 <- sapply(otuTbl_PSO1_smpCol[colnames(otuTbl_PSO1_smpCol) %in% tertile3,], function(x) sum(x>0))
summary(ppl_occurrences_T3)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.00   34.00   41.00   39.92   46.00   66.00

How many OTUs are present in 5%, 25%, 50% of samples?

What is the firmicutes to bacteroidetes ratio per tertile? How common are Firmicutes? Mean, SD, range? How common are Bacteroidetes? Mean, SD, range?

PSOtmao1_Phylum <- tax_glom(PSOtmao1, "Phylum", NArm = TRUE)
PSOtmao1_Phylum_tertile1 <- subset_samples(PSOtmao1_Phylum, tmao_tertile == "Tertile1")
PSOtmao1_Phylum_tertile2 <- subset_samples(PSOtmao1_Phylum, tmao_tertile == "Tertile2")
PSOtmao1_Phylum_tertile3 <- subset_samples(PSOtmao1_Phylum, tmao_tertile == "Tertile3")

#~~~~~~~~~~
# Tertile 1
#~~~~~~~~~~
# Prevalence - how many samples the taxa is observed in
prevalence_Phy_tert1 = apply(X = otu_table(PSOtmao1_Phylum_tertile1),
               MARGIN = ifelse(taxa_are_rows(PSOtmao1_Phylum_tertile1), yes = 1, no = 2),
               FUN = function(x){sum(x > 0)})

prevalence_Phy_tert1 = data.frame(Prevalence = prevalence_Phy_tert1,
                    TotalAbundance = taxa_sums(PSOtmao1_Phylum_tertile1), # the total number of occurrences (even if only observed 50 times in 1 sample, the number would be 50)
                    tax_table(PSOtmao1_Phylum_tertile1))

# Firmicutes:Bacteroidetes Abundance Ratio
firm_bacter_t1 <- prevalence_Phy_tert1$TotalAbundance[prevalence_Phy_tert1$Phylum == " p__Firmicutes"] / prevalence_Phy_tert1$TotalAbundance[prevalence_Phy_tert1$Phylum == " p__Bacteroidetes"] 

#~~~~~~~~~~
# Tertile 2
#~~~~~~~~~~
# Prevalence - how many samples the taxa is observed in
prevalence_Phy_tert2 = apply(X = otu_table(PSOtmao1_Phylum_tertile2),
               MARGIN = ifelse(taxa_are_rows(PSOtmao1_Phylum_tertile2), yes = 1, no = 2),
               FUN = function(x){sum(x > 0)})

prevalence_Phy_tert2 = data.frame(Prevalence = prevalence_Phy_tert2,
                    TotalAbundance = taxa_sums(PSOtmao1_Phylum_tertile2), # the total number of occurrences (even if only observed 50 times in 1 sample, the number would be 50)
                    tax_table(PSOtmao1_Phylum_tertile2))

firm_bacter_t2 <- prevalence_Phy_tert2$TotalAbundance[prevalence_Phy_tert2$Phylum == " p__Firmicutes"] / prevalence_Phy_tert2$TotalAbundance[prevalence_Phy_tert2$Phylum == " p__Bacteroidetes"] 

#~~~~~~~~~~
# Tertile 3
#~~~~~~~~~~
# Prevalence - how many samples the taxa is observed in
prevalence_Phy_tert3 = apply(X = otu_table(PSOtmao1_Phylum_tertile3),
               MARGIN = ifelse(taxa_are_rows(PSOtmao1_Phylum_tertile3), yes = 1, no = 2),
               FUN = function(x){sum(x > 0)})

prevalence_Phy_tert3 = data.frame(Prevalence = prevalence_Phy_tert3,
                    TotalAbundance = taxa_sums(PSOtmao1_Phylum_tertile3), # the total number of occurrences (even if only observed 50 times in 1 sample, the number would be 50)
                    tax_table(PSOtmao1_Phylum_tertile3))

firm_bacter_t3 <- prevalence_Phy_tert3$TotalAbundance[prevalence_Phy_tert3$Phylum == " p__Firmicutes"] / prevalence_Phy_tert3$TotalAbundance[prevalence_Phy_tert3$Phylum == " p__Bacteroidetes"] 

# Compare ratios
cat("Firmicutes:Bacteroidetes ratio tertile 1:", firm_bacter_t1)
## Firmicutes:Bacteroidetes ratio tertile 1: 3.476189
cat("Firmicutes:Bacteroidetes ratio tertile 2:", firm_bacter_t2)
## Firmicutes:Bacteroidetes ratio tertile 2: 4.427447
cat("Firmicutes:Bacteroidetes ratio tertile 3:", firm_bacter_t3)
## Firmicutes:Bacteroidetes ratio tertile 3: 4.76323

What is the % abundance of a given taxa? Get summary statistics per tertile.

Range percentage of taxa per person = n occurrences of bug/total occurrences of all bugs * 100

#~~~~~~~~~~
# Tertile 1
#~~~~~~~~~~
trans <- transform_sample_counts(PSOtmao1_Phylum_tertile1, function(x) x / sum(x))
melted <- psmelt(trans)
subdf <- melted[melted$Phylum == " p__Firmicutes", ]
cat("Summmary of Firmicute abundance in tertile 1:")
## Summmary of Firmicute abundance in tertile 1:
summary(subdf$Abundance)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1953  0.7177  0.8451  0.7823  0.8963  0.9700
#~~~~~~~~~~
# Tertile 2
#~~~~~~~~~~
trans <- transform_sample_counts(PSOtmao1_Phylum_tertile2, function(x) x / sum(x))
melted <- psmelt(trans)
subdf <- melted[melted$Phylum == " p__Firmicutes", ]
cat("Summmary of Firmicute abundance in tertile 2:")
## Summmary of Firmicute abundance in tertile 2:
summary(subdf$Abundance)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3883  0.7556  0.8306  0.8030  0.8812  0.9995
#~~~~~~~~~~
# Tertile 3
#~~~~~~~~~~
trans <- transform_sample_counts(PSOtmao1_Phylum_tertile3, function(x) x / sum(x))
melted <- psmelt(trans)
subdf <- melted[melted$Phylum == " p__Firmicutes", ]
cat("Summmary of Firmicute abundance in tertile 3:")
## Summmary of Firmicute abundance in tertile 3:
summary(subdf$Abundance)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3361  0.7816  0.8399  0.8175  0.8994  0.9584
#~~~~~~~~~~
# Tertile 1
#~~~~~~~~~~
trans <- transform_sample_counts(PSOtmao1_Phylum_tertile1, function(x) x / sum(x))
melted <- psmelt(trans)
subdf <- melted[melted$Phylum == " p__Bacteroidetes", ]
cat("Summmary of Bacteroidetes abundance in tertile 1:")
## Summmary of Bacteroidetes abundance in tertile 1:
summary(subdf$Abundance)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.005037 0.076650 0.135212 0.190595 0.258695 0.799678
#~~~~~~~~~~
# Tertile 2
#~~~~~~~~~~
trans <- transform_sample_counts(PSOtmao1_Phylum_tertile2, function(x) x / sum(x))
melted <- psmelt(trans)
subdf <- melted[melted$Phylum == " p__Bacteroidetes", ]
cat("Summmary of Bacteroidetes abundance in tertile 2:")
## Summmary of Bacteroidetes abundance in tertile 2:
summary(subdf$Abundance)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0005031 0.0951410 0.1450016 0.1744789 0.2134875 0.5996346
#~~~~~~~~~~
# Tertile 3
#~~~~~~~~~~
trans <- transform_sample_counts(PSOtmao1_Phylum_tertile3, function(x) x / sum(x))
melted <- psmelt(trans)
subdf <- melted[melted$Phylum == " p__Bacteroidetes", ]
cat("Summmary of Bacteroidetes abundance in tertile 3:")
## Summmary of Bacteroidetes abundance in tertile 3:
summary(subdf$Abundance)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.02117 0.08077 0.12667 0.15860 0.18713 0.66188

Genus: What are the most common genus? Use the PSOtmao1G, subset to low-tertile, make prevalence data frame, sort in descending order. Repeat for other tertiles.

# Subset by tertiles 
PSOtmao1_tertile1 <- subset_samples(PSOtmao1, tmao_tertile == "Tertile1")
PSOtmao1_tertile2 <- subset_samples(PSOtmao1, tmao_tertile == "Tertile2")
PSOtmao1_tertile3 <- subset_samples(PSOtmao1, tmao_tertile == "Tertile3")

# How many taxa are there per tertile?
cat("n taxa in tertile 1:", ntaxa(PSOtmao1_tertile1))
## n taxa in tertile 1: 508
cat("n taxa in tertile 2:", ntaxa(PSOtmao1_tertile2))
## n taxa in tertile 2: 508
cat("n taxa in tertile 3:", ntaxa(PSOtmao1_tertile3))
## n taxa in tertile 3: 508
# Now subset at the genus level to identify the most common genus
# Subset by tertiles (Genus)
PSOtmao1G_tertile1 <- subset_samples(PSOtmao1G, tmao_tertile == "Tertile1")
PSOtmao1G_tertile2 <- subset_samples(PSOtmao1G, tmao_tertile == "Tertile2")
PSOtmao1G_tertile3 <- subset_samples(PSOtmao1G, tmao_tertile == "Tertile3")

#~~~~~~~~~~
# Tertile 1
#~~~~~~~~~~
# Prevalence - how many samples the taxa is observed in
prevalence_tert1 = apply(X = otu_table(PSOtmao1G_tertile1),
               MARGIN = ifelse(taxa_are_rows(PSOtmao1G_tertile1), yes = 1, no = 2),
               FUN = function(x){sum(x > 0)})

prevalence_tert1 = data.frame(Prevalence = prevalence_tert1,
                    TotalAbundance = taxa_sums(PSOtmao1G_tertile1), # the total number of occurrences (even if only observed 50 times in 1 sample, the number would be 50)
                    tax_table(PSOtmao1G_tertile1))


#~~~~~~~~~~
# Tertile 2
#~~~~~~~~~~
# Prevalence - how many samples the taxa is observed in
prevalence_tert2 = apply(X = otu_table(PSOtmao1G_tertile2),
               MARGIN = ifelse(taxa_are_rows(PSOtmao1G_tertile2), yes = 1, no = 2),
               FUN = function(x){sum(x > 0)})

prevalence_tert2 = data.frame(Prevalence = prevalence_tert2,
                    TotalAbundance = taxa_sums(PSOtmao1G_tertile2), # the total number of occurrences (even if only observed 50 times in 1 sample, the number would be 50)
                    tax_table(PSOtmao1G_tertile2))
#~~~~~~~~~~
# Tertile 3
#~~~~~~~~~~
# Prevalence - how many samples the taxa is observed in
prevalence_tert3 = apply(X = otu_table(PSOtmao1G_tertile3),
               MARGIN = ifelse(taxa_are_rows(PSOtmao1G_tertile3), yes = 1, no = 2),
               FUN = function(x){sum(x > 0)})

prevalence_tert3 = data.frame(Prevalence = prevalence_tert3,
                    TotalAbundance = taxa_sums(PSOtmao1G_tertile3), # the total number of occurrences (even if only observed 50 times in 1 sample, the number would be 50)
                    tax_table(PSOtmao1G_tertile3))

View and sort top 10

cat("Most abundant genus in T1:")
## Most abundant genus in T1:
head(prevalence_tert1[order(prevalence_tert1$TotalAbundance, decreasing= TRUE),], n = 10)
##                                  Prevalence TotalAbundance     Kingdom
## e00f85cb3452e37943500e86afb268f4        119         317886 k__Bacteria
## f966e124604e0e32b209b88df6e42cd4        117         270923 k__Bacteria
## bb1b75f41ff9c9db1d1de41e8388eb52        119         246667 k__Bacteria
## 098c3bbd8234f4ac198297ac0bde957d         51         201995 k__Bacteria
## 5c4ca852b40641b3eb0ad23e69bb6583        118         162882 k__Bacteria
## 42de4e368cca3ff50954f82c12dd5315        117         156693 k__Bacteria
## 1dda53416f231a3345668df39d4ae780        118         102679 k__Bacteria
## 73b4f14d16f02ee0ac82838166128de4        117          70872 k__Bacteria
## e4ae256bb51896c21795f743dc9ed9dd        119          58906 k__Bacteria
## d2a695aa271adf072749a729fd96a731        116          47410 k__Bacteria
##                                             Phylum           Class
## e00f85cb3452e37943500e86afb268f4     p__Firmicutes   c__Clostridia
## f966e124604e0e32b209b88df6e42cd4     p__Firmicutes   c__Clostridia
## bb1b75f41ff9c9db1d1de41e8388eb52  p__Bacteroidetes  c__Bacteroidia
## 098c3bbd8234f4ac198297ac0bde957d  p__Bacteroidetes  c__Bacteroidia
## 5c4ca852b40641b3eb0ad23e69bb6583     p__Firmicutes   c__Clostridia
## 42de4e368cca3ff50954f82c12dd5315     p__Firmicutes   c__Clostridia
## 1dda53416f231a3345668df39d4ae780     p__Firmicutes   c__Clostridia
## 73b4f14d16f02ee0ac82838166128de4     p__Firmicutes   c__Clostridia
## e4ae256bb51896c21795f743dc9ed9dd     p__Firmicutes   c__Clostridia
## d2a695aa271adf072749a729fd96a731     p__Firmicutes   c__Clostridia
##                                              Order              Family
## e00f85cb3452e37943500e86afb268f4  o__Clostridiales  f__Lachnospiraceae
## f966e124604e0e32b209b88df6e42cd4  o__Clostridiales  f__Ruminococcaceae
## bb1b75f41ff9c9db1d1de41e8388eb52  o__Bacteroidales   f__Bacteroidaceae
## 098c3bbd8234f4ac198297ac0bde957d  o__Bacteroidales   f__Prevotellaceae
## 5c4ca852b40641b3eb0ad23e69bb6583  o__Clostridiales  f__Lachnospiraceae
## 42de4e368cca3ff50954f82c12dd5315  o__Clostridiales  f__Ruminococcaceae
## 1dda53416f231a3345668df39d4ae780  o__Clostridiales  f__Lachnospiraceae
## 73b4f14d16f02ee0ac82838166128de4  o__Clostridiales  f__Lachnospiraceae
## e4ae256bb51896c21795f743dc9ed9dd  o__Clostridiales  f__Lachnospiraceae
## d2a695aa271adf072749a729fd96a731  o__Clostridiales  f__Ruminococcaceae
##                                                 Genus Species
## e00f85cb3452e37943500e86afb268f4           g__Blautia    <NA>
## f966e124604e0e32b209b88df6e42cd4  g__Faecalibacterium    <NA>
## bb1b75f41ff9c9db1d1de41e8388eb52       g__Bacteroides    <NA>
## 098c3bbd8234f4ac198297ac0bde957d        g__Prevotella    <NA>
## 5c4ca852b40641b3eb0ad23e69bb6583         g__Roseburia    <NA>
## 42de4e368cca3ff50954f82c12dd5315      g__Ruminococcus    <NA>
## 1dda53416f231a3345668df39d4ae780       g__Coprococcus    <NA>
## 73b4f14d16f02ee0ac82838166128de4       g__Clostridium    <NA>
## e4ae256bb51896c21795f743dc9ed9dd    g__[Ruminococcus]    <NA>
## d2a695aa271adf072749a729fd96a731      g__Oscillospira    <NA>
cat("Most abundant genus in T2:")
## Most abundant genus in T2:
head(prevalence_tert2[order(prevalence_tert2$TotalAbundance, decreasing= TRUE),], n = 10)
##                                  Prevalence TotalAbundance     Kingdom
## e00f85cb3452e37943500e86afb268f4        118         334724 k__Bacteria
## f966e124604e0e32b209b88df6e42cd4        116         270587 k__Bacteria
## bb1b75f41ff9c9db1d1de41e8388eb52        118         218767 k__Bacteria
## 42de4e368cca3ff50954f82c12dd5315        118         189125 k__Bacteria
## 098c3bbd8234f4ac198297ac0bde957d         65         140748 k__Bacteria
## 5c4ca852b40641b3eb0ad23e69bb6583        116         136821 k__Bacteria
## 1dda53416f231a3345668df39d4ae780        117         122306 k__Bacteria
## 73b4f14d16f02ee0ac82838166128de4        118          60612 k__Bacteria
## d2a695aa271adf072749a729fd96a731        118          55299 k__Bacteria
## e4ae256bb51896c21795f743dc9ed9dd        117          44602 k__Bacteria
##                                             Phylum           Class
## e00f85cb3452e37943500e86afb268f4     p__Firmicutes   c__Clostridia
## f966e124604e0e32b209b88df6e42cd4     p__Firmicutes   c__Clostridia
## bb1b75f41ff9c9db1d1de41e8388eb52  p__Bacteroidetes  c__Bacteroidia
## 42de4e368cca3ff50954f82c12dd5315     p__Firmicutes   c__Clostridia
## 098c3bbd8234f4ac198297ac0bde957d  p__Bacteroidetes  c__Bacteroidia
## 5c4ca852b40641b3eb0ad23e69bb6583     p__Firmicutes   c__Clostridia
## 1dda53416f231a3345668df39d4ae780     p__Firmicutes   c__Clostridia
## 73b4f14d16f02ee0ac82838166128de4     p__Firmicutes   c__Clostridia
## d2a695aa271adf072749a729fd96a731     p__Firmicutes   c__Clostridia
## e4ae256bb51896c21795f743dc9ed9dd     p__Firmicutes   c__Clostridia
##                                              Order              Family
## e00f85cb3452e37943500e86afb268f4  o__Clostridiales  f__Lachnospiraceae
## f966e124604e0e32b209b88df6e42cd4  o__Clostridiales  f__Ruminococcaceae
## bb1b75f41ff9c9db1d1de41e8388eb52  o__Bacteroidales   f__Bacteroidaceae
## 42de4e368cca3ff50954f82c12dd5315  o__Clostridiales  f__Ruminococcaceae
## 098c3bbd8234f4ac198297ac0bde957d  o__Bacteroidales   f__Prevotellaceae
## 5c4ca852b40641b3eb0ad23e69bb6583  o__Clostridiales  f__Lachnospiraceae
## 1dda53416f231a3345668df39d4ae780  o__Clostridiales  f__Lachnospiraceae
## 73b4f14d16f02ee0ac82838166128de4  o__Clostridiales  f__Lachnospiraceae
## d2a695aa271adf072749a729fd96a731  o__Clostridiales  f__Ruminococcaceae
## e4ae256bb51896c21795f743dc9ed9dd  o__Clostridiales  f__Lachnospiraceae
##                                                 Genus Species
## e00f85cb3452e37943500e86afb268f4           g__Blautia    <NA>
## f966e124604e0e32b209b88df6e42cd4  g__Faecalibacterium    <NA>
## bb1b75f41ff9c9db1d1de41e8388eb52       g__Bacteroides    <NA>
## 42de4e368cca3ff50954f82c12dd5315      g__Ruminococcus    <NA>
## 098c3bbd8234f4ac198297ac0bde957d        g__Prevotella    <NA>
## 5c4ca852b40641b3eb0ad23e69bb6583         g__Roseburia    <NA>
## 1dda53416f231a3345668df39d4ae780       g__Coprococcus    <NA>
## 73b4f14d16f02ee0ac82838166128de4       g__Clostridium    <NA>
## d2a695aa271adf072749a729fd96a731      g__Oscillospira    <NA>
## e4ae256bb51896c21795f743dc9ed9dd    g__[Ruminococcus]    <NA>
cat("Most abundant genus in T3:")
## Most abundant genus in T3:
head(prevalence_tert3[order(prevalence_tert3$TotalAbundance, decreasing= TRUE),], n = 10)
##                                  Prevalence TotalAbundance     Kingdom
## e00f85cb3452e37943500e86afb268f4        118         319013 k__Bacteria
## f966e124604e0e32b209b88df6e42cd4        117         244405 k__Bacteria
## 42de4e368cca3ff50954f82c12dd5315        117         208080 k__Bacteria
## bb1b75f41ff9c9db1d1de41e8388eb52        118         175200 k__Bacteria
## 098c3bbd8234f4ac198297ac0bde957d         62         156321 k__Bacteria
## 5c4ca852b40641b3eb0ad23e69bb6583        118         142357 k__Bacteria
## 1dda53416f231a3345668df39d4ae780        118         126744 k__Bacteria
## d2a695aa271adf072749a729fd96a731        117          56040 k__Bacteria
## 73b4f14d16f02ee0ac82838166128de4        118          52970 k__Bacteria
## e4ae256bb51896c21795f743dc9ed9dd        115          48650 k__Bacteria
##                                             Phylum           Class
## e00f85cb3452e37943500e86afb268f4     p__Firmicutes   c__Clostridia
## f966e124604e0e32b209b88df6e42cd4     p__Firmicutes   c__Clostridia
## 42de4e368cca3ff50954f82c12dd5315     p__Firmicutes   c__Clostridia
## bb1b75f41ff9c9db1d1de41e8388eb52  p__Bacteroidetes  c__Bacteroidia
## 098c3bbd8234f4ac198297ac0bde957d  p__Bacteroidetes  c__Bacteroidia
## 5c4ca852b40641b3eb0ad23e69bb6583     p__Firmicutes   c__Clostridia
## 1dda53416f231a3345668df39d4ae780     p__Firmicutes   c__Clostridia
## d2a695aa271adf072749a729fd96a731     p__Firmicutes   c__Clostridia
## 73b4f14d16f02ee0ac82838166128de4     p__Firmicutes   c__Clostridia
## e4ae256bb51896c21795f743dc9ed9dd     p__Firmicutes   c__Clostridia
##                                              Order              Family
## e00f85cb3452e37943500e86afb268f4  o__Clostridiales  f__Lachnospiraceae
## f966e124604e0e32b209b88df6e42cd4  o__Clostridiales  f__Ruminococcaceae
## 42de4e368cca3ff50954f82c12dd5315  o__Clostridiales  f__Ruminococcaceae
## bb1b75f41ff9c9db1d1de41e8388eb52  o__Bacteroidales   f__Bacteroidaceae
## 098c3bbd8234f4ac198297ac0bde957d  o__Bacteroidales   f__Prevotellaceae
## 5c4ca852b40641b3eb0ad23e69bb6583  o__Clostridiales  f__Lachnospiraceae
## 1dda53416f231a3345668df39d4ae780  o__Clostridiales  f__Lachnospiraceae
## d2a695aa271adf072749a729fd96a731  o__Clostridiales  f__Ruminococcaceae
## 73b4f14d16f02ee0ac82838166128de4  o__Clostridiales  f__Lachnospiraceae
## e4ae256bb51896c21795f743dc9ed9dd  o__Clostridiales  f__Lachnospiraceae
##                                                 Genus Species
## e00f85cb3452e37943500e86afb268f4           g__Blautia    <NA>
## f966e124604e0e32b209b88df6e42cd4  g__Faecalibacterium    <NA>
## 42de4e368cca3ff50954f82c12dd5315      g__Ruminococcus    <NA>
## bb1b75f41ff9c9db1d1de41e8388eb52       g__Bacteroides    <NA>
## 098c3bbd8234f4ac198297ac0bde957d        g__Prevotella    <NA>
## 5c4ca852b40641b3eb0ad23e69bb6583         g__Roseburia    <NA>
## 1dda53416f231a3345668df39d4ae780       g__Coprococcus    <NA>
## d2a695aa271adf072749a729fd96a731      g__Oscillospira    <NA>
## 73b4f14d16f02ee0ac82838166128de4       g__Clostridium    <NA>
## e4ae256bb51896c21795f743dc9ed9dd    g__[Ruminococcus]    <NA>

Save

write.csv(prevalence_tert1, "../../Outputs/tables/Microbiome/Prevalence_Genus_Tertile1_2201.csv")
write.csv(prevalence_tert2, "../../Outputs/tables/Microbiome/Prevalence_Genus_Tertile2_2201.csv")
write.csv(prevalence_tert3, "../../Outputs/tables/Microbiome/Prevalence_Genus_Tertile3_2201.csv")

Correlate Fecal Microbiome with Phenos

Now that we identified microbiome characteristics that are relevant to TMAO, ask if they

  1. Improve the variance explained between TMAO and a given cardiometabolic marker
  2. Correlate to cardiometabolic phenos
da <- read.csv("../../Outputs/tables/Microbiome/DESeq2_PSOtmao1G_TMAOTertile_220119.csv", row.names = 1)
taxaName_famgen <- da[,"family_genus", drop = F]
taxaName_famgen
##                                                                family_genus
## a0798db0c868e9ad6089502956d68d87  f__Erysipelotrichaceae g__Catenibacterium
## c5859b7f5d14b8c145fbc3ad583c70e8    f__Erysipelotrichaceae g__Coprobacillus
## 5c4ca852b40641b3eb0ad23e69bb6583            f__Lachnospiraceae g__Roseburia
## 8f5ceded1cd7e86b8271d3fab24d322a         f__Lachnospiraceae g__Butyrivibrio
resOTU <- rownames(da)
otuData <- as.data.frame(otu_table(PSOtmao1G))
otuData_desRes <- otuData[rownames(otuData) %in% resOTU,]
otuData_desRes <- as.data.frame(t(otuData_desRes))
# rename for ease downstream
#names(otuData_desRes)[names(otuData_desRes) == "bb1b75f41ff9c9db1d1de41e8388eb52"] <- "f__Bacteroidaceae_g__Bacteroides"
#names(otuData_desRes)[names(otuData_desRes) == "b180a2455b236c103cf0bfe620095736"] <- "f__Peptostreptococcaceae_g__"
#names(otuData_desRes)[names(otuData_desRes) == "e1a2800b24cdf9779b28dc897cddb12a"] <- "f__Christensenellaceae_g__"
#names(otuData_desRes)[names(otuData_desRes) == "8f5ceded1cd7e86b8271d3fab24d322a"] <- "f__Lachnospiraceae_g__Butyrivibrio"

# Update PSOtmao_NA results
names(otuData_desRes)[names(otuData_desRes) == "a0798db0c868e9ad6089502956d68d87"] <- "f__Erysipelotrichaceae_g__Catenibacterium"
names(otuData_desRes)[names(otuData_desRes) == "c5859b7f5d14b8c145fbc3ad583c70e8"] <- "f__Erysipelotrichaceae_g__Coprobacillus"
names(otuData_desRes)[names(otuData_desRes) == "5c4ca852b40641b3eb0ad23e69bb6583"] <- "f__Lachnospiraceae_g__Roseburia"
names(otuData_desRes)[names(otuData_desRes) == "8f5ceded1cd7e86b8271d3fab24d322a"] <- "f__Lachnospiraceae_g__Butyrivibrio"

Next chunk is borrowed from “Characteristics_Diet_Cardiometabolic”

phen <- readRDS("../../data/processed/phenotype/Phenotypes_211108.rds")

# Complete TMAO data only
phen <- phen[complete.cases(phen$tmao == TRUE),]

# Get rid of NA ethnicity category
phen$ethnicity <- droplevels(phen$ethnicity)

# Factor and recode sex
phen$sex <- as.factor(phen$sex)
phen$sex <- plyr::revalue(phen$sex, c("1" = "Male", "2" = "Female"))

# Make TMAO log variable
phen$tmao_log <- log(phen$tmao)

# Quantile
tmao_quantile <- quantile(phen$tmao)
phen$tmao_quantile <- ifelse(phen$tmao <= tmao_quantile[2], "Quantile1",
                             ifelse(phen$tmao > tmao_quantile[2] & phen$tmao <= tmao_quantile[3], "Quantile2",
                                   ifelse(phen$tmao > tmao_quantile[3] & phen$tmao <= tmao_quantile[4], "Quantile3", "Quantile4")))

# Check it
table(phen$tmao_quantile)
## 
## Quantile1 Quantile2 Quantile3 Quantile4 
##        91        91        89        90
phen$tmao_quantile <- as.factor(phen$tmao_quantile)
str(phen$tmao_quantile)
##  Factor w/ 4 levels "Quantile1","Quantile2",..: 2 1 4 1 4 1 1 1 1 3 ...
# Tertile
tmao_tertile <- quantile(phen$tmao, c(0:3/3))
phen$tmao_tertile <- ifelse(phen$tmao <= tmao_tertile[2], "Tertile1",
                             ifelse(phen$tmao > tmao_tertile[2] & phen$tmao <= tmao_tertile[3], "Tertile2", "Tertile3"))
# Factor it
phen$tmao_tertile <- factor(phen$tmao_tertile)
table(phen$tmao_tertile)
## 
## Tertile1 Tertile2 Tertile3 
##      121      120      120
# Checked all variables for normality via shapiro.test() in console. TG, and glc ere the only ones that didn't pass.
phen$glc_bd1_ln <- log(phen$glc_bd1)
phen$tg_bd1_ln <- log(phen$tg_bd1)
phen$crp_bd1_ln <- log(phen$crp_bd1)
phen$tnfa_bd1_ln <- log(phen$tnfa_bd1)
phen$il6_bd1_ln <- log(phen$il6_bd1)

Merge phenotype and microbiome data for linear regression analysis

phen <- merge(phen, otuData_desRes, by=0)
rownames(phen) <- phen[,"Row.names"]
phen <- subset(phen, select = -c(Row.names))

Run the regression!

Old results: f__Bacteroidaceae_g__Bacteroides + f__Peptostreptococcaceae_g__ + f__Christensenellaceae_g__ + f__Lachnospiraceae_g__Butyrivibrio

New results: f__Erysipelotrichaceae_g__Catenibacterium + f__Erysipelotrichaceae_g__Coprobacillus + f__Lachnospiraceae_g__Roseburia + f__Lachnospiraceae_g__Butyrivibrio

# Transfored variable list
transformed_keep_cardio_tbl <- c("sex", "age","bmi_final", "ethnicity", "waistavg_cm", "sysbpavgv2", "diabpavgv2", "endo_rhi", "endo_al75", "hdl_bd1", "ldl_bd1", "chol_bd1", "tg_bd1_ln", "glc_bd1_ln","tnfa_bd1_ln", "il6_bd1_ln", "crp_bd1_ln", "cystatinc_bd1", "choline", "betaine", "carnitine","tmao_log", "f__Erysipelotrichaceae_g__Catenibacterium", "f__Erysipelotrichaceae_g__Coprobacillus", "f__Lachnospiraceae_g__Roseburia", "f__Lachnospiraceae_g__Butyrivibrio")

#~~~~~~~~
# TMAO  
#~~~~~~~~
# Make a smaller df from master (the old data)
phen_sub <- phen[, colnames(phen) %in% transformed_keep_cardio_tbl]
#cvd_biomark <- subset(phen_sub, select = -c(sex, age, ethnicity, cystatinc_bd1, tmao_log, f__Bacteroidaceae_g__Bacteroides, f__Peptostreptococcaceae_g__, f__Christensenellaceae_g__, f__Lachnospiraceae_g__Butyrivibrio)) # change variable here
cvd_biomark <- subset(phen_sub, select = -c(sex, age, ethnicity, cystatinc_bd1, tmao_log, f__Erysipelotrichaceae_g__Catenibacterium, f__Erysipelotrichaceae_g__Coprobacillus, f__Lachnospiraceae_g__Roseburia, f__Lachnospiraceae_g__Butyrivibrio)) # change variable here
n <- ncol(cvd_biomark)

# LCMS method
# Run linear model
my_lm <- lapply(1:n, function(x) lm(cvd_biomark[,x] ~ tmao_log + sex + age + cystatinc_bd1, phen_sub )) # include kcal
#my_lm <- lapply(1:n, function(x) lm(cvd_biomark[,x] ~ ethnicity, phen_sub ))
#my_lm <- lapply(1:n, function(x) lm(cvd_biomark[,x] ~ cystatinc_bd1, phen_sub ))

# Organize results
my_lm2 <- sapply(my_lm, function(x){summary(x)$adj.r.squared})
my_lm2.5 <- sapply(my_lm, function(x){summary(x)$coefficients[2,1]}) # [CVD row, estimate] corresponds to CVD B 
my_lm2.6 <- sapply(my_lm, function(x){summary(x)$coefficients[2,2]}) # [CVD row, Std.Error] corresponds to CVD Std. Error 
my_lm2.7 <- sapply(my_lm, function(x){summary(x)$coefficients[2,3]}) # [CVD row, T value] corresponds to CVD t value
my_lm3 <- sapply(my_lm, function(x){summary(x)$coefficients[2,4]}) # [CVD row, P] corresponds to CVD P value
my_lm361_lcms_cvd <- cbind(my_lm2, my_lm2.5, my_lm2.6, my_lm2.7, my_lm3)
colnames(my_lm361_lcms_cvd) <- c("adj.r.squared", "estimate", "Std.Error", "T.value","P")
row.names(my_lm361_lcms_cvd) <- colnames(cvd_biomark)
res <- as.data.frame(my_lm361_lcms_cvd)

# BH correction
res$p.adj <- stats::p.adjust(res$P, method = "BH", n = n) 

#~~~~~~~~
# TMAO + Bug
#~~~~~~~~
# Make a smaller df from master (the old data)
phen_sub <- phen[, colnames(phen) %in% transformed_keep_cardio_tbl]
#cvd_biomark <- subset(phen_sub, select = -c(sex, age, ethnicity, cystatinc_bd1, tmao_log, f__Bacteroidaceae_g__Bacteroides, f__Peptostreptococcaceae_g__, f__Christensenellaceae_g__, f__Lachnospiraceae_g__Butyrivibrio)) # change variable here
cvd_biomark <- subset(phen_sub, select = -c(sex, age, ethnicity, cystatinc_bd1, tmao_log, f__Erysipelotrichaceae_g__Catenibacterium, f__Erysipelotrichaceae_g__Coprobacillus, f__Lachnospiraceae_g__Roseburia, f__Lachnospiraceae_g__Butyrivibrio)) # change variable here
n <- ncol(cvd_biomark)

# LCMS method
# Run linear model
my_lm <- lapply(1:n, function(x) lm(cvd_biomark[,x] ~ tmao_log + sex + age + cystatinc_bd1 + f__Erysipelotrichaceae_g__Catenibacterium + f__Erysipelotrichaceae_g__Coprobacillus + f__Lachnospiraceae_g__Roseburia + f__Lachnospiraceae_g__Butyrivibrio, phen_sub )) # add bugs

# Organize results
my_lm2 <- sapply(my_lm, function(x){summary(x)$adj.r.squared})
my_lm2.5 <- sapply(my_lm, function(x){summary(x)$coefficients[2,1]}) # [CVD row, estimate] corresponds to CVD B 
my_lm2.6 <- sapply(my_lm, function(x){summary(x)$coefficients[2,2]}) # [CVD row, Std.Error] corresponds to CVD Std. Error 
my_lm2.7 <- sapply(my_lm, function(x){summary(x)$coefficients[2,3]}) # [CVD row, T value] corresponds to CVD t value
my_lm3 <- sapply(my_lm, function(x){summary(x)$coefficients[2,4]}) # [CVD row, P] corresponds to CVD P value
my_lm361_lcms_cvd_des <- cbind(my_lm2, my_lm2.5, my_lm2.6, my_lm2.7, my_lm3)
colnames(my_lm361_lcms_cvd_des) <- c("adj.r.squared", "estimate", "Std.Error", "T.value","P")
row.names(my_lm361_lcms_cvd_des) <- colnames(cvd_biomark)
res_des <- as.data.frame(my_lm361_lcms_cvd_des)

# BH correction
res_des$p.adj <- stats::p.adjust(res_des$P, method = "BH", n = n) 

Save for supplemental material

write.csv(res_des, "../../Outputs/tables/Microbiome/LinReg_cardiometabolic_withDESeq2results_2201.csv")
saveRDS(res_des, "../../Outputs/tables/Microbiome/LinReg_cardiometabolic_withDESeq2results_2201.Rds")

Is TNF-a related to the gut microbiome directly? ## Factored by tnfa Tertile

THIS CHUNK WORK IN PROGRESS

# Quick check
summary(lm(tnfa_bd1_ln ~ f__Erysipelotrichaceae_g__Catenibacterium, phen_sub)) # NS
## 
## Call:
## lm(formula = tnfa_bd1_ln ~ f__Erysipelotrichaceae_g__Catenibacterium, 
##     data = phen_sub)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.75264 -0.20345  0.01658  0.22131  1.36001 
## 
## Coefficients:
##                                            Estimate Std. Error t value Pr(>|t|)
## (Intercept)                               7.071e-01  1.999e-02  35.365   <2e-16
## f__Erysipelotrichaceae_g__Catenibacterium 1.408e-05  3.635e-05   0.387    0.699
##                                              
## (Intercept)                               ***
## f__Erysipelotrichaceae_g__Catenibacterium    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3665 on 353 degrees of freedom
## Multiple R-squared:  0.0004247,  Adjusted R-squared:  -0.002407 
## F-statistic:  0.15 on 1 and 353 DF,  p-value: 0.6988
summary(lm(tnfa_bd1_ln ~ f__Erysipelotrichaceae_g__Coprobacillus, phen_sub)) # NS
## 
## Call:
## lm(formula = tnfa_bd1_ln ~ f__Erysipelotrichaceae_g__Coprobacillus, 
##     data = phen_sub)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.75541 -0.20621  0.01635  0.22043  1.35724 
## 
## Coefficients:
##                                           Estimate Std. Error t value Pr(>|t|)
## (Intercept)                              7.099e-01  2.002e-02  35.459   <2e-16
## f__Erysipelotrichaceae_g__Coprobacillus -8.847e-05  4.276e-04  -0.207    0.836
##                                            
## (Intercept)                             ***
## f__Erysipelotrichaceae_g__Coprobacillus    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3666 on 353 degrees of freedom
## Multiple R-squared:  0.0001212,  Adjusted R-squared:  -0.002711 
## F-statistic: 0.0428 on 1 and 353 DF,  p-value: 0.8362
summary(lm(tnfa_bd1_ln ~ f__Lachnospiraceae_g__Roseburia, phen_sub)) # NS
## 
## Call:
## lm(formula = tnfa_bd1_ln ~ f__Lachnospiraceae_g__Roseburia, data = phen_sub)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.75527 -0.20636  0.01859  0.22085  1.36067 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     7.039e-01  2.702e-02  26.052   <2e-16 ***
## f__Lachnospiraceae_g__Roseburia 4.019e-06  1.506e-05   0.267     0.79    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3666 on 353 degrees of freedom
## Multiple R-squared:  0.0002019,  Adjusted R-squared:  -0.00263 
## F-statistic: 0.07127 on 1 and 353 DF,  p-value: 0.7896
summary(lm(tnfa_bd1_ln ~ f__Lachnospiraceae_g__Butyrivibrio, phen_sub)) # NS
## 
## Call:
## lm(formula = tnfa_bd1_ln ~ f__Lachnospiraceae_g__Butyrivibrio, 
##     data = phen_sub)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.75290 -0.20885  0.01744  0.22294  1.31878 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        0.7073501  0.0196836  35.936   <2e-16 ***
## f__Lachnospiraceae_g__Butyrivibrio 0.0002342  0.0004610   0.508    0.612    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3665 on 353 degrees of freedom
## Multiple R-squared:  0.0007302,  Adjusted R-squared:  -0.002101 
## F-statistic: 0.2579 on 1 and 353 DF,  p-value: 0.6119
# With covariates
summary(lm(tnfa_bd1_ln ~ f__Erysipelotrichaceae_g__Catenibacterium + sex + age + bmi_final, phen_sub)) # NS
## 
## Call:
## lm(formula = tnfa_bd1_ln ~ f__Erysipelotrichaceae_g__Catenibacterium + 
##     sex + age + bmi_final, data = phen_sub)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.65200 -0.21536  0.00781  0.20530  1.30132 
## 
## Coefficients:
##                                             Estimate Std. Error t value
## (Intercept)                                2.534e-01  1.201e-01   2.110
## f__Erysipelotrichaceae_g__Catenibacterium  1.032e-05  3.544e-05   0.291
## sexFemale                                 -5.680e-02  3.782e-02  -1.502
## age                                        5.893e-03  1.373e-03   4.294
## bmi_final                                  8.977e-03  3.878e-03   2.315
##                                           Pr(>|t|)    
## (Intercept)                                 0.0356 *  
## f__Erysipelotrichaceae_g__Catenibacterium   0.7710    
## sexFemale                                   0.1340    
## age                                       2.28e-05 ***
## bmi_final                                   0.0212 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.355 on 350 degrees of freedom
## Multiple R-squared:  0.0701, Adjusted R-squared:  0.05948 
## F-statistic: 6.597 on 4 and 350 DF,  p-value: 3.97e-05
summary(lm(tnfa_bd1_ln ~ f__Erysipelotrichaceae_g__Coprobacillus + sex + age + bmi_final, phen_sub)) # Sig, minor inverse relationship
## 
## Call:
## lm(formula = tnfa_bd1_ln ~ f__Erysipelotrichaceae_g__Coprobacillus + 
##     sex + age + bmi_final, data = phen_sub)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.65306 -0.21560  0.00637  0.20702  1.30148 
## 
## Coefficients:
##                                           Estimate Std. Error t value Pr(>|t|)
## (Intercept)                              2.517e-01  1.205e-01   2.088   0.0375
## f__Erysipelotrichaceae_g__Coprobacillus  5.103e-05  4.157e-04   0.123   0.9024
## sexFemale                               -5.772e-02  3.784e-02  -1.526   0.1280
## age                                      5.880e-03  1.372e-03   4.286 2.35e-05
## bmi_final                                9.105e-03  3.867e-03   2.355   0.0191
##                                            
## (Intercept)                             *  
## f__Erysipelotrichaceae_g__Coprobacillus    
## sexFemale                                  
## age                                     ***
## bmi_final                               *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3551 on 350 degrees of freedom
## Multiple R-squared:  0.06992,    Adjusted R-squared:  0.05929 
## F-statistic: 6.578 on 4 and 350 DF,  p-value: 4.101e-05
summary(lm(tnfa_bd1_ln ~ f__Lachnospiraceae_g__Roseburia + sex + age + bmi_final, phen_sub)) # NS
## 
## Call:
## lm(formula = tnfa_bd1_ln ~ f__Lachnospiraceae_g__Roseburia + 
##     sex + age + bmi_final, data = phen_sub)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.65386 -0.21536  0.01029  0.20793  1.30170 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      2.509e-01  1.210e-01   2.074   0.0388 *  
## f__Lachnospiraceae_g__Roseburia  1.976e-06  1.465e-05   0.135   0.8928    
## sexFemale                       -5.705e-02  3.787e-02  -1.507   0.1328    
## age                              5.880e-03  1.372e-03   4.286 2.35e-05 ***
## bmi_final                        9.050e-03  3.869e-03   2.339   0.0199 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3551 on 350 degrees of freedom
## Multiple R-squared:  0.06993,    Adjusted R-squared:  0.0593 
## F-statistic: 6.579 on 4 and 350 DF,  p-value: 4.095e-05
summary(lm(tnfa_bd1_ln ~ f__Lachnospiraceae_g__Butyrivibrio + sex + age + bmi_final, phen_sub)) # NS
## 
## Call:
## lm(formula = tnfa_bd1_ln ~ f__Lachnospiraceae_g__Butyrivibrio + 
##     sex + age + bmi_final, data = phen_sub)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.65274 -0.21524  0.00943  0.20466  1.27282 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         0.2493843  0.1204311   2.071   0.0391 *  
## f__Lachnospiraceae_g__Butyrivibrio  0.0001722  0.0004506   0.382   0.7026    
## sexFemale                          -0.0572223  0.0377585  -1.515   0.1306    
## age                                 0.0058291  0.0013763   4.235 2.92e-05 ***
## bmi_final                           0.0092340  0.0038819   2.379   0.0179 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.355 on 350 degrees of freedom
## Multiple R-squared:  0.07027,    Adjusted R-squared:  0.05964 
## F-statistic: 6.613 on 4 and 350 DF,  p-value: 3.859e-05

The family christensenellaceae is widely inversely associated with BMI. A review is here.

#summary(lm(f__Christensenellaceae_g__ ~ bmi_final + sex + age, phen))